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Preface 



The field of Cryptology witnessed a revolution in the late seventies. Since then 
it has been expanded into an important and exciting area of research. Over the 
last two decades, India neither participated actively nor did it contribute sig- 
nificantly towards the development in this field. However, recently a number of 
active research groups engaged in important research and developmental work 
have crystalized in different parts of India. As a result, their interaction with 
the international crypto community has become necessary. With this backdrop, 
it was proposed that a conference on cryptology - INDOCRYPT, be organized 
for the first time in India. The Indian Statistical Institute was instrumental in 
hosting this conference. INDOCRYPT has generated a large amount of enthu- 
siasm amongst the Indians as well as the International crypto communities. An 
INDOCRYPT steering committee has been formed and the committee has plans 
to make INDOCRYPT an annual event. 

For INDOCRYPT 2000, the program committee considered a total of 54 pa- 
pers and out of these 25 were selected for presentation. The conference program 
also included two invited lectures by Prof. Adi Shamir and Prof. Eli Biham. 

These proceedings include the revised versions of the 25 papers accepted by 
the program committee. These papers were selected from all the submissions 
based on originality, quality and relevance to the field of Cryptology. Revisions 
were not checked and the authors bear the full responsibility for the contents of 
the papers in these proceedings. 

The selection of the papers was a very difficult and challenging task. I wish to 
thank all the Program Committee members who did an excellent job in reviewing 
the papers and providing valuable feedback to the authors. Each submission 
was reviewed by at least three (only a few by two) reviewers. The program 
committee was assisted by many colleagues who reviewed submissions in their 
areas of expertise. The list of external reviewers has been provided separately. 
My thanks go to them all. 

My sincere thanks goes to Springer- Verlag, in particular to Mr. Alfred Hof- 
mann, for the inclusion of the seminar proceedings in their prestigious series Lec- 
ture Notes in Computer Science. I am also indebted to Prof. Jacques Stern, Prof. 
Jennifer Seberry, and Prof. Cunsheng Ding for giving their valuable advise and 
suggestions towards making the publication of the proceedings of INDOCRYPT 
2000 possible. 

I gratefully acknowledge financial support from diffferent organizations to- 
wards making INDOCRYPT 2000 a success. The contributors were AgniRoth 
(California, USA), Tata Conusltancy Service (Calcutta, India), CMC Limited 
(New Delhi, India), Cognizant Technology Solutions (Calcutta, India), Gemplus 
(Bangalore, India), Ministry of Information Technology (Govt, of India), and 
IDRBT (Hyderabad, India). I once again thank them all. 

In organizing the scientific program and putting together these proceedings I 
have been assisted by many people. In particular I would like to thank Subhamoy 
Maitra, Sarbani Palit, Arindom De, Kishan Chand Gupta, and Sandeepan Chowd- 
hury. 
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Preface 



Finally I wish to thank all the authors who submitted papers, making this 
conference possible, and the authors of successful papers for updating their pa- 
pers in a timely fashion, making the production of these proceedings possible. 



December 2000 Bimal Roy 
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The Correlation of a Boolean Function 
with Its Variables * 



Dingyi Pei and Wenliang Qin 

State Key Laboratory of Information Security, 
Graduate School of Chinese Academy of Science 



Abstract. The correlation of a Boolean function with its variables is 
closely related to the correlation attack on stream cipher. The Walsh 
transformation is the main tool to study the correlation of Boolean func- 
tions. The Walsh transformation of a Boolean function with r variables 
has 2’’ coefficients. Let k denote the number of non-zero coefficients of the 
Walsh Transformations. The paper studies the functions with 1 < fc < 8. 
It is proved that the functions with fe = 1 are the linear functions only, 
there are no functions with fc = 2, 3, 5, 6, 7, and finally we construct all 
functions with fc = 4 or 8. 

keywords: Boolean function, correlation, Walsh transformation, stream 
cipher 



1 Introduction 

The Boolean functions are widely used in communication and cryptography. It is 
important to choice Boolean functions with desired properties in practice. This 
paper studies the correlation of a Boolean function with its variables which is a 
property closely related to the correlation attack on stream cipher [1-4]. 

Let F 2 be the field with two elements, and f{x) = /(xq, • • • ,Xr-i) be a 
Boolean function from FJ to F 2 ■ There is an one - to - one correspondence be- 
tween the elements (xq, xi, • • • , x^-i) of FJ and the integers 0 < x < 2’’, defined 
by X = Xo + 2xi -!-••• + 2’’“^Xr-i. Let i = io + 2ii -I- • • • -I- 2’'“^v-i {ij = 0 or 1) 
be another integer and put i- x = ioXo H — h V- iXr - 1 . The Walsh transformation 
of the function /(x) is defined by 

2 ''-! 

a{i) = 0 < i < 2’’ (1) 

x— 0 

(/(x) -I- j • X is understood as a read number), which plays an improtant rule 
in studying of Boolean functions. It is easy to know that a{i) is the difference 
between the number of x for which /(x) = i ■ x and the number of x for which 
/(x) yf j • X. The more large the absolute value of a{i), the more strong the 
correlation of /(x) with i ■ x. Consider the correlation attack on stream cipher, 

* Supported by NNSF under contract No. 19931010 
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we wish to find Boolean functions, for which the value max|a(t)| achieves its 

I 

minimum. 

It is well known that 

i=0 

hence 

max |a(i)| > 2’’/^. 

i 

When the equality holds, the function / is called bent function, which is not 
balanced. We hope to find balanced Boolean functions with the value max|a(i)| 

as small as possible. 

Let k denote the number of non-zero coefficients a{i) {0 < i < 2 ’’). The 
main result of this paper is to determine all Boolean functions with k < 8. It is 
possible to generalize the method of this paper for more larger k. 

Theorem 1 Let {a{i) | 1 < t < 2 ’’} be the Walsh transformation of the Boolean 
function f : Ff — > F2 and k = ff{a(i) \ a(i) ^ 0 , 0 < i < 2 ’’}. Then 

( 1 ) There is no Boolean function with k = 2 , 3 , 5 , 6, 7 . 

( 2 ) All functions f{x) with k = 1 are linear functions f{x) = cqXo + • • • + 

C7. — — 1 (0^ € Ff) • 

( 3 ) All functions f{x) with fc = 4 can he constructed by the following way, Let 

Vi = {(ooi CH) 0^2, <33) G F2 I c^o + oi + 02 + 0^3 = 0} 

he the subspace of Ff. For each 0 < j < r, take (*o(j)) *i(j)) *2(j), *3(j)) G V4 
such that 



ii = ii{ 0 ) + 2 ii{l) + • • • + 2 ^-\i{r - 1 ), / = 0 , 1 , 2 , 3 
are 4 different integers. Define f{x) by 

then the Walsh transformation of f{x) has four non-zero coefficients 
{a{to),a{n),a{i2),a{zs)) = ±{2^-\ 2^-\ 2^-\ -2^~^). 

( 4 ) All functions f{x) with k = 8 can he constructed by the following way. Put 

eo = (l, 1,1, 1,1, 1,1,1), 
ei = (0,0,0,0,l, 1,1,1), 
e2 = (0,0 ,l,l, 0,0,1, 1), 
e3 = (0,l,0,l,0,l,0,l). 

Let Vs be the subspace of solutions for the equation system 
eo ■ X = ei ■ X = 62 • X = es ■ X = 0. 
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For each 0 < j < 8, take ■ ■ ■ ,* 7 (j)) G Vs such that ii = 

r— 1 

Y] ii{j) -2^ (0 < / < 8) are 8 different integers. Define fi{x) and f 2 {x) by 

3=0 

= ± _|_ ^_2^y3-a: _|_ ^_iyi'X 

+ {-ly’^-^ + _ 3(_i)*7-3=^ 

Then the Walsh transformation of fi (x) has eight non-zero coefficients 
(a(*o),a(*i),--- ,a(*r)) = ± 

2 ’’-^ 2 ’-^ 2 ’’-^ 2 ’’- 2, -3 • 2 ’-^), 

the Walsh transformation of f 2 (x) has eight non-zero coefficients 
(a(*o), a(*i), • • • , a(*r)) = ± {2^-^ 2^~f 2^~f 

2 ^- 2 , - 2 ’- 2 , - 2 ’'- 2 , - 2 ’’- 2 , 3 • 2 '- 2 ), 



2 Some Lemmas 



Lemma 1 Let {a(t)} be the Walsh transformation of f{x), then 

2 ''-! 

5^a(*)(-ir = 2’'(-l/(^), 

2=0 






(2) 

(3) 



Let I = (/q, /i, • • • , lu-i) C (0, 1, • • • , r — 1) (u < r) and {lu,‘ ‘ ‘ ■, Ir-i) be the 
complement of I in (0, 1, • • • r— 1). Write Xi also as x{i). Fixing x{lt) = y{k) {u < 
t < r) inx = (x(0), • • • , x(r— 1)), f{x) becomes a Boolean function of u variables 
x{h) (0 < t < u) with Walsh transformation 



E (-1) 

c(io),-" ,a:(iu_i) 



= 2 



/ ( e' ^(h)2‘t +E v(h)2‘A Ye" <it)-i{h) 

\ t = 0 t=u J t=0 

E «(*)(-!) 



E y(h)idt) 

t=u 
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Denote above summation of the right side by Si{i{lo), ■ ■ • 

y(L),- ■ ■ ,y(lr-i)), and Si{i{lo),- ■ ■ 0 , • • • , 0 ) is also written as 

Si{i{lo),‘ ■ ■ We have by Lemma 1 that 



Y, Si{i{lo),--- ,i{lu-i);y{lu),--- ,y{lr-i)){-i)^-° = 

(4) 

and 

^ Sf{i{lo),--- ,i{lu-i);y{lu),--- ,y{lr-i)) = r. (5) 

Note that (2)-(5) are the equalities satisfied by {a(i)}. 

Lemma 2 The diophantine equation 



x\ + x\ + xl + x\ = A'' 

has the solutions (±2'’, 0,0,0) and (±2’’“^, ±2’’“^, ±2’’“^, ±2’’“^), and these are 
the all of its solutions. 

Proof. Suppose (oi, 02 , 03 , 04 ) is a solution, 2*| | gcd(oi, 02 , 03 , 04 ) and = 2*?/^. 
At least one of j/i (1 < i < 4) is odd, and y\ + y 2 + y1 + y\ = 4’’“*, hence 
we have t < r. If t = r, then one of yi is ±1 and the others are zero, so 
( 01 , 02 , 03 , 04 ) = (±2’’, 0, 0, 0). If t = r — 1, then yi = ±I (1 < i < 4) and 
( 01 , 02 , 03 , 04 ) = (±2^-\±2^-\±2^-\±2^-^). in<r-2, then 

2 / 1 + 2 / 2 + 2 / 3 + 2/4 = 0 (mods). 

It is impossible since y^ = 1 (mod 8) if y is odd and = 0 or 4 (mod 8) if y is 
even. 

Lemma 3 Suppose k>5 and 2=(i,j)c{0,l,---,r— 1}. If one of Si{u, v){u, v = 
0, 1) is a non-zero a{i), then other three of them eould not be a sum of two or 
three non-zero a{i). 

Proof. Assume a{it) yf 0 (0 < t < fc). We have by (3) 

k— 1 

^o(**)^=4A (6) 

t=o 

Without loss of generality we may assume that I = (0, 1) and S'(o4)(0, 0) = o(io). 
Since 

A(\i)(0,0) + 5(\i)(0, 1) + 5^0.1) (1.0) + ^(0.1) (1. 1) = 4’’ 
by (5) and i)(0,0) = a^(io) < 4^, therefore 

^2^^i)(0,0) = 5^0.1) (0,1) = ^(\i)(l,0) = 5^o,i)(l.l) =4'--' 
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(Lemma 2). Suppose 

“ o(*i) + ^(*2) + • • • + a(is), 

we should prove that s 2 or 3. Similarly, we can prove the same conclusion for 
-5'(o.i)(l,0) and 5'(o,i)(l, 1). 

Assume that s = 2 first. Since ii yf 12 , there exists 2 < j < r such that 
*i(j) * 2 (j)- We can assume j = 2 and ti(2) = 0, 12 ( 2 ) = 1. 





0 


I 


2 


*0 


0 


0 




*1 


0 


I 


0 


*2 


0 


I 


I 



We know that ^^( 0 , 1 ) = (a(ii) + 0(12))^ = d*" Similarly, by ( 5 ) and 
Lemma 2 , we have ^^( 0 , 1 ; 1 , 0 , • • • , 0 ) = (a(ii) — 0(12))^ = 4 ’’“^. It follows 
that 0(11)0(12) = 0 . This is impossible. 

Assume that s = 3 next. Since 11,125*3 are different to each other, we may 
assume *i( 2 ) = 0 , *2(2) = *3(2) = 1 , and *2(3) = 0 , *s( 3 ) = 1 . Suppose *i( 3 ) = 0 
(similarly to prove for the case of *i( 3 ) = 1 ). 





0 


I 


2 


3 


*0 


0 


0 






*1 


0 


I 


0 


0 


*2 


0 


I 


I 


0 


*3 


0 


I 


I 


I 



We can show by the way as above that 

1) = + “(* 2 ) + a(*3)) =4’’ 

,o)(^5 1) = ~ ^(* 2 ) ~ a(*3)) =4 , 

0 , 1 , 0 ,- ,o)( 0 > 1 ) = («(*i) + a(*2) - a(* 3 ))^ = 

It follows that o(*i)^ = o(* 2 )^ = a{izY = 4’’“^, and we already have S'^p ^^(0, 0) = 
o(*o)^ = 4’’“^, this is contrary to (6). 



3 Case of 1 < fc < 4 

Taking f{x) + 1 instead of f{x) if it is necessary, we can assume /(O) = 0. 

Suppose k = 1. There exists an integer 0 < *0 < 2’’ such that o(*o) = ±2’’ 
and o(*) = 0 when * yf *q. Hence we have (— = (— 1)*°'-1 by (2). It follows 
that /(j) = * • J is a linear function. 

It is easy to see by Lemma 2 and (3) that k yf 2,3. 
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Suppose /c = 4, a(io), a(ii), 0(12), a(*3) are non-zero, and the other a(i) = 0. 
Similarly we know a(it) = ±2’’“^ (0 < t < 3) by Lemma 2 and (3), Taking 
j = 0 in (2) we get 



a(io) + a(ii) + 0(12) + 0(13) = 2’’. 

Therefore (a(to), a(ii), 0(12), 0(13)) = (2’’“^, 2’’“^, 2’’“^, —2’’“^) (ignore the or- 
der). Put 

r— 1 

^^ = ^^^(s)2^ ? = 0,1,2,3, 

s^O 

we can show that 



(io(s),ii(s),i2(s),t3(s)) e V4, 0 < s < r. (7) 

In fact, if for some s (take s = 0) it is the case: to(0) = 0, ti(0) = 12(0) = 13(0) = 
1, then S'(o)(0) = ±2’’“^, S'(o)(l) = 2’’“^ or 3 • 2’’“^. It is contrary to (5). 

Conversely, suppose the condition (7) holds. For any 0 < j < 2’’, (to • j,ii ■ 
j, *2 • j, *3 • j ) belongs to V4, therefore 

(_l)*oo (_i)nd (_i)*2d _ (_i)*30 = ±2, 
the conclusion (3) of the Theorem is true. 

4 Case of 5 < fc < 8 

Suppose a{it) {0 < t < k) are non-zero and it{0) {0 < t < k) are not all the 
same. 

(i) If there is exactly one 0 among it{0) {0 < t < k) (If there is exactly 

one 1, the case can be discussed by the same way. We will not consider the 
symmetrical case obtained by alternating 0 and 1 in the following). Assume 
to(0) = 0, t((0) = 1 (0 < t < A:). Then 0 < S'(o)(0)^ = a^(to) < d*", it contradicts 
to -S'^o)(®) + (Lemma 2). 

(ii) If there are exactly three 0 among it{0) {0 < t < k). Assume to(0) = 
*i(0) = 12 ( 0 ) = 0, and io(l) = 0, ti(l) = 12 ( 1 ) = 1. Then S'(o,i)(0,0) = a(io), 
S'(o^i)(0, 1) = a(ii) -I- 0(12), it is impossible by Lemma 3. 

So far we have proved fc yf 5. Suppose 6 < fc < 8 in the following. 

(iii) If there are exactly two 0 among it(0) (0 < t < A:). Assume io(0) = 
ii(0) = 0 and io(l) = 0,ii(l) = 1. Using (i), (ii) already proved above and 
Lemma 3(take (i,j) = (0, 1)), we need only to consider the case that there is 
only one 0 among it(l) (2 < A < k). Assume i2(l) = 0,it(l) = 1 (3 < A < A:). 
Then S'(o,i)(0,0) = a(io), S'(o,i)(0,l) = a(ii), S'(i^o)(l,0) = «(*2), S'(o,i)(l,l) = 
0(13) -I- 0(14) -I- • -I- a{ik-i)- Hence we have proved A: yf 6 (Lemma 3). Suppose 
A: = 7 or 8, we have 

k—1 2 

a{iof = a(ii)^ = 0(12)^ = (0(13) + ^ = 4 ’’“\ 



(8) 
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We may assume that it ( 2 ) {3 < t < k) are not all the same. If there is only 
one 0 among it{ 2 ) (3 < t < k). Assume 13(2) = 0, then S'(o4)(l, 1 ; 1 , 0, • • • ,0) = 

k-l 

^(^3) - I] and 

t^4 

(afe) - XI “(**)) =4’'“^ 

3 

by (5) and Lemma 2. It follows 0(13)^ = 4’’“^ by (8) and a{itY = 4’’, 

t=o 

which contradicts to ( 3 ). If there are exactly two 0 among it( 2 ) {3 < t < k), 
assume 13(2) = 14(2) = 0 . If fc = 7 , we need only to consider the case that 
it{ 2 ) = 0 (0 < t < 3 ). Since 15 yf 15, we assume is( 3 ) = 0 , ie( 3 ) = 1 . Then 
-5 '(i.2)(1, 1 ) = a(*s) + a(ie), S'(i^2)(l,2; 0 , 1 , 0 , 0 , 0 ) = 0(15) - a(i6). Hence 

(0(15) + 0(15))^ = 0 or 4’', 

(0(15) - 0(15))^ = 0 or 4’’. 

It follows that 0(15)^ = a(ie)^ = 4’’“^ and this fact together with (8) contradicts 
to (3). Hence we have proved fc yf 7. If A: = 8, we need only to consider the case 
that there is only one 1 among it(2) (0 < t < 3). taking (i,j) = (1,2) when 
io(2) = 1 or 12(2) = 1, or (i,j) = (0,2) when ii(2) = 1, we can prove it is also 
impossible by Lemma 3. 

Now we assume fc = 8. Summerizing what have proved above, for any 0 < 
j < r, it{j) (0 < t < 8) are all 0, or all l,or half of them are 0. The last 
case must appear since it {0 < t < 8) are different to each other. We may 
assume i*(0) = 0 (0 < t < 4) and it(0) = 1 (4 < t < 7). Furthermore we 
assume it(l) (0 < t < 3) are not all the same. It is imposible that there is only 
one 0 (or 1) among it(l) (0 < t < 3) (Lemma 3). Therefore we can assume 
io(l) = ti(l) = 0, 12(1) = 13(1) = 1, 14(1) = *5(1) = 0, 15(1) +^7(1) = 1, and 
io(2) = 12(2) = 74(2) = ig(2) = 0, ii(2) = 73(2) = 75(2) = iy(2) = 1. Taking 
I = (0,1), (y(2),... ,y(7)) = (0,0,--- ,0) and (y(2), • • • , y(7)) = (1,0,--- ,0) 
respectively in (5), we obtain 

(a(io) + a(ii)y + (aih) + 0(73))^ + (0(14) + 0(75))^ + (a(ie) + a(* 7 ))^ = 4 ’', 

(a(io) - a(ii))^ + (0(72) - 0(73))^ + (0(74) - 0(75))^ + (0(73) - a(*7))^ = 4 ’’. 

When a and b are non-zero integers, (a+ &)^ and (a — b)^ could not be 4’’“^ (or 
0) simultaneously, hence we have by Lemma 2 

(a( 7 o) + a( 7 i))^ = (0(72) + 0(73))^ = (0(74) + 0(75))^ = (0(75) + a( 77 ))^ = 4 ’’“\ 
( a { io ) - a { ii)Y = (0(72) - 0(73))^ = (0(74) - 0(75))^ = 0, (0(75) - 0(77))^ = 4’', 

(a(io) + a(ii))^ = (0(72) + 0(73))^ = (0(74) + 0(75))^ = 0, (0(75) + 0(77))^ = 4’'. 
(a(io) - a(ii))^ = (0(72) - 0(73))^ = (0(74) - 0(75))^ = ( a { ie ) - 0(77))^ = 4 ’’“b 



or 
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Consider the first equation system. It follows a(io) = a(ii) = ±2’’ 0 ( 12 ) = 

afe) = ±2’'-2, a{H) = a{h) = ±2^~^ (a{ie) , a^ir)) = ±(3 • 2"-2, -2’-2). 
Taking j = 0 in (2) we get 

a(io) + 0(12) + 0(14) ± 2’’“^ = 2’’“^. 

Therefore we obtain two solutions 

2^—2 2^~^ 2^~^ 2^~^ 3 • 2^~^ 

2^—2 2^~^ 2^~^ 2^~^ 2^~^ 2^~^ 3 2^~^ 

The second equation system has the same two solutions. 

It is easy to check that 

_|_ _|_ I^_iy2-x _|_ _j_ I^_iy4-x _|_ ^_2^y5-a: 

+ (_l)*6-3^_3.(_iy7-x) =±1 



and 

_(_I)i6-2=_g3. (_I)*7-x) 

if (ioO), *i(j)5 • • • CtO)) G hs (0 < j < r). The Theorem is proved completely 
now. 
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Abstract. Here I suggest a design criterion for the choice of connection- 
polynomials in LFSR-based stream-cipher systems. I give estimates of 
orders of magnitude of the sparse-multiples of primitive-polynomials. I 
show that even for reasonable degrees (degrees of the order of 100) of 
primitive connection-polynomials the degrees of their sparse-multiples 
are “considerably higher”. 



1 Introduction 

A binary linear-feedback shift-register (LFSR, in short) is a system which gen- 
erates a pseudo-random bit-sequence using a binary recurrence-relation of the 
form 

0"n = CiOn-l + C2<ln-2 + . . . + Ckdn-k 

where Ck = 1 and each Ci other than Ck belong to {0,1}. The length of the LFSR 
corresponds to the order k of the linear-recurrence-relation used. The number of 
taps of the LFSR is the number t of non-zero bits in (ci, C 2 , . . . , c^}. 

Once the shift-register is initialised by assigning values to oq, ai,...,afc_i 
i.e., once the seed of the LFSR is set, the successive bits of the sequence are 
emitted using the chosen recurrence relation. 

The above LFSR is closely related to the following polynomial over GF(2) 

c(A) = Co + ciX C 2 X^ -|- . . . -I- CkX^ 

with cq=1. This polynomial is called the connection-polynomial of the LFSR. If X 
in c(X) is interpreted as an operator that shifts left the argument sequence, it can 
be inferred that the connection polynomial define the fundamental recurrence 
over the LFSR generated sequence a. Similarly it can be seen that any multiple 
of the connection-polynomial correspondingly define a linear-recurrence-relation 
which holds on the LFSR-generated sequence. 

The connection-polynomials are in general chosen as primitive-polynomials 
over GF(2) in order to generate a key-stream of maximum periodicity for the 
given length of the LFSR. 
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LFSRs are popularly employed in stream-cipher systems to genearte a key- 
stream sequence which is bitwise xored with message sequence to produce an 
encrypted message. In practical implementations, the key-stream is usually gen- 
erated by combining the outputs of more than one LFSRs using a non-linear 
boolean combining function. This arrangement significantly increases the robust- 
ness of the system against possible attacks. 

LFSR systems with their connection-polynomials very sparse are particularly 
very vulnerable to various known attacks. The underlying principles of these 
attacks are easily extendable to the situation where the feedback polynomial 
has many terms but is a factor of a low density polynomial of moderate degree. 
For example, the primitive-polynomial 1 + + x^^ + x^^ + x^"^ + 
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X + X 

x^^+x^'^ + x^'^ + x^'^ + x'^^ + x^'^ + x^''^ though sufficiently dense does not qualify as 
Connection-polynomial of a LFSR. This is because this polynomial divides the 
moderate-degree 4-nomial -I- -I- x^^ -I- 1. Meier and Staffelbach |T] state 

that ”it appears to be very difficult to decide whether a given polynomial has 
this {above mentioned) property”. We address this issue and suggest a design 
criterion for the choice of connection polynomials for LFSR-based stream-cipher 
systems. 



2 Some Results on Sparse-Multiples of Polynomials 

2.1 On Trinomial Multiples 

In this section we treat trinomial-multiples of polynomials and their associated 
properties. 

Theorem 1. If f(x) is a primitive polynomial of degree d > 0 and if x’^ + x* + 1 
is the least degree trinomial-multiple of it then s < . 

Proof. Let f(x) be a primitive polynomial of degree d > 0 and let x^ -I- x* -I- 1 
be the least-degree trinomial- multiple of it. Also let e be the exponent to which 
f(x) belongs. Now consider the following set of polynomials 

= {x, x^, X®, . . . , x'^} 

5 2 = {x'* -|- X, x'* -I- x^, x'* -I- X®, 

5 3 = {x* -|- X, X* -I- x^, X* -|- X®, 

Now we make the following claims: 

1) The set 5'i contains elements that are distinct (mod f(x)). 

If this were not true we would have x* = x^ (mod f(x)) for some 1 < i, j < s 
and i yf j . Without loss of generality assume that i > j . Now since we are given 
that f(x) divides a trinomial with non-zero constant term, we can infer that f(x) 
is prime to x. So cancelling out common x-power terms in the above congruence 



...,x^ + x^-i} 
...,x‘ + x"-i} 
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we would have = 1 (mod f(x)).This implies that e divides (i-j). But since 

(i-j) < s, we can infer that e < s. 

Now let s’ and t’ be the least non-negative residues (mod e) of s and t re- 
spectively. Since + x* + 1 = 0 (mod f(x)) we must have x® -I- x* -1-1 = 0 
(mod f(x)). Since f(x) is prime to x, neither s’ nor t’ can be zero. Without loss of 
genearlity assume that s’ > t’ so that the degree of the trinomial x® -I- x* -I- 1 is 
s’. Since x® -I- x* -I- 1 is a trinomial-multiple of f(x) we should have s’ > s. But 
we inferred that s > e in the last para. So putting these inequalities together we 
get s’ > e. But this cannot be true. Hence our initial assumption must be wrong 
and the set S'! should indeed contain elements distinct (mod f(x)). 

2) The sets S 2 and S '3 also contain elements that are distinct (mod f(x)). 

The proof of this is very similar to that given for claim (1) above. 

3) No two elements belonging to sets S\ and S 2 are congruent (mod f(x)). 

If this were not true we would have x* = x®-|-X'l (mod f(x)), l<i<s, l<j< 
s. Since f(x) is prime to x, s i and i yf j. Also s y^ j. (i.e., s, i, j are all different). 
In this case as before, we could cancel out the common x-power terms in the 
above congruence and end up with a trinomial-multiple of f(x) whose degree is 
less than s. But this would be a contradiction. 



4) No two elements belonging to sets S\ and S '3 are congruent (mod f(x)). 

The proof is similar to that given for claim (3) above. 

5) No two elements of sets S 2 and S 3 are congruent (mod f(x)). 

If this were not true we would have x®-|-x* = x^-l-x-^ (mod f(x)) for some 1 < i, 
j < s-1. Noticing that x®-|-x* = 1 (mod f(x)), we would have H-x^-t-x-^ = 0 (mod 
f(x)). Furthermore i cannot be equal to j. Thus, as before we have ended up with 
a trinomial multiple of f(x) the degree of which is less than s. This cannot be true. 



The claims (1) to (5) proved above, in effect, say that the sets Si, S 2 and S 3 
contain (3s-2) elements distinct (mod f(x)). This is possible only if (3s-2) < 2®*. 

^ ( 2 *^+ 2 ) „ 

I.e., s < ’ Z ’ . □ 



Theorem 2. An irreducible polynomial belonging to exponent e divides a trino- 
mial iff ( X® -I- 1, (x -I- 1)® 1) is non-trivial. 

Proof. Let f(x) be an irreducible polynomial of degree d belonging to an exponent 
e. Let a be a root of it. Now consider the polynomials 

e-l 

X® -I- 1 = IJ^(x + a*) (1) 

i=0 
e-l 

(x -I- 1)® -I- 1 = JJ^(x -I- a* -I- 1) 

i=0 



( 2 ) 
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If polynomials ([T]) and Q have a non-trivial gcd then they have a common 
root implying that o’” = a" + 1 for some non-negative m, n < e. This sug- 
gests that a is a root of the polynomial x'^ + x'^ + 1. Since f(x) is the minimal 
polynomial of a this in turn suggests that f(x) divides the trinomial x™ -I- x" -I- 1. 

Conversely, if f(x) divides some trinomial x™ -I- x" -I- 1 then it also divides 
the trinomial x*” -I- x" -I- 1 where m’ and n’ are the least positive residues (mod 
e) of m and n respectively. Therefore, a™ -I- a" -1-1 = 0. This suggests that 
polynomials O and (0 have a common root and hence a non-trivial gcd. □ 



For the sake of illustration, consider the polynomial x"^ -I- x^ -I- x^ -I- x -I- 1 which 
is irreducible over GF(2). This polynomial belongs to exponent 5 and does not 
divide any trinomial. 



Theorem 3. If f(x) is a primitive-polynomial of degree d and if x"^ -I- x" -I- 1 is 
a trinomial divisible by f(x) then m and n belong to the same-length eyclotomic- 
coset ( mod{2^ — 1) ). 

Proof. Assume that 

X™ -I- x" -I- 1 = 0 {mod f{x)) (3) 

and let Im and In be the length of the cyclotomic-cosets ( mod(2'^ — 1) ) to which 
m and n belong. So we have, 

2*’"m = 771 {mod{2‘^—l)) 

2*" 71 = n ( mod{2'^ ~ 1) ) 



Raising both sides of the congruence (EJ to the power 2*"* we have. 



^m2 ™ ^n2 = Q f {x)) 

+ 1 = 0 {modf{x)) 

Adding congruences ([3|) and (S) and after rearranging terms we get 

x(2 = 1 {mod f{x)) 

Therefore 

(2'” -1)77 = 0 {mod{2'^-l)) 



(4) 



which implies that In divides Im- By similar reasoning, it follows that Im divides 
In- Therefore Im = In and the theorem follows. □ 
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For the sake of illustration, consider the case d=6. The set of all cyclotomic- 
cosets ( mod(2® — 1) ) are, 

Co = { 0 } 

Cl = {1, 2, 4, 8, 16, 32} 

C3 = {3, 6, 12, 24, 48, 33} 

C5 = {5, 10, 20, 40, 17, 34} 

C7 = {7, 14, 28, 56, 49, 35} 

Cg = {9, 18, 36} 

Cii = {11, 22, 44, 25, 50, 37} 

Ci3 = (13, 26, 52, 41, 19, 38} 

Ci5 = (15, 30, 60, 57, 51, 39} 

C21 = (21, 42} 

C23 = (23, 46, 29, 58, 53, 43} 

C27 = (27, 54, 45} 

C31 = (31, 62, 61, 59, 55, 47} 

The polynomial -\- x -\- 1 \s primitive. The set of all trinomials 

of degree less than (2® — 1) that it divides are 
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Note that the powers of x occurring in the same trinomial-multiple belong 
to the same-length cyclotomic-coset (mod(2® — 1) ). 

2.2 On 4-nomial Multiples 

In this section we give a upper bound on the degree of the minimum-degree 
4-nomial-multiple of a Polynomial. 
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Theorem 4. All primitive-polynomials of degree d> 3 divide some ^-nomial of 
degree 

Proof. Let f(x) be a primitive-polynomial of degree d. Let /o = 

Consider the set of all binomials of the form x* -I- , where 0 < i, j < f, for some 

f and i yf j. There are them. For the choice of f=/o, the number of 

these binomials exceed 2*^. Since there are only 2‘^ different congruence classes 
(mod f(x)), by the pigeon-hole principle atleast two of these binomials should be 
congruent (mod f(x)). Thus there are two different un-ordered pairs r\, si and 
C 2 , S 2 such that 

-I- x'^^ {modf{x)) 

For d > 3, if ri were equal to C 2 , then 

x'^^ = x^^ {modf{x)) 

which implies that si = S2{rnod{2^ — 1)). Since 0 < si, S 2 < /o and /o < 
{2‘^ — 1) it follows that si = S 2 contradicting the fact that ri, si and T 2 , S 2 
are different un-ordered pairs. Thus the above congruence give rise to 4-nomial 
divisible by f(x) and of degree atmost /q. □ 



2.3 On Degrees of Sparse-Multiples of Primitive- Polynomials 

In this section we study the nature of upper and lower bounds on the degrees of 
sparse-multiples of primitive-polynomials. 

Firstly we show that there are relatively fewer number of primitive-poly- 
nomials of reasonable degree that divide a lower-weight polynomial of lesser 
degree. This result shows that any randomly chosen primitive-polynomial of 
reasonable degree qualifies as a connection polynomial of a LFSR with high 
probability. 

Subsequently we comment on how small the degrees of sparse-multiples of 
certain primitive-polynomials could be. 

Lemma 1. For all integers d, 4>{2'^ — 1) > (1.548"^ — 1) 

Proof. Consider the factorization of 2"^ — 1 in to primes. Let 

where each pi is a prime and each > 1. Then 

^(2<i-l) = (2'^-l)P(l-l) 



( 5 ) 
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Since 2^* — 1 is odd, each Pi>3 and (1 — > (2/3). Therefore, 




(6) 



Also, 2“^ — 1 > Y\l=iPi — 3’^- So 2’^ > S’" and r < {logi,2)d. Therefore, 



(2/3)’’ > (2/3)('°932)d 

Equation (01 together with inequalities (0) and (|7) give that 

</(2‘^ - 1) > - l)(2/3)('°9=^^)‘^ 



( 7 ) 



i.e. 



0(2'^ - 1) > (l-SdS"^ - 0.774'^) > (l-SdS"^ - 1) 



This proves the lemma. 



□ 



Theorem 5. For a given t > 2 and s > 1, if d is such that 1.548“^ — 1 > 
(t— 1) )^^) then there exists atleast one primitive-polynomial of degree d which 

does not divide any t-nomial of degree < d^ . 

Proof. Given t > 2 and s > 1, let d be chosen such that 1.548"^— 1 > (t— 1) . 

Let us assume that for this choice of d, all primitive-polynomials of degree d 
divide some t-nomial of degree < d®. If TTpri{x,d) denotes the product of all 
primitive-polynomials of degree d and TTt-nomiaiix,d‘) denotes the product of 
all t-nomials prime to x and of degree < d'*, then 

TTprii^X^d^ divides '^t—nomial(.X^d ) 

and 

the degree of TTpri{x, d) < the degree of irt-nomiaiix, d®). 

The degree of -Kpri{x, d) is </(2‘^ — 1) 

The degree of TTt-nomiai{x, d^) “ (“^ 2 ) 




Therefore 




By lemma m we have. 



(j){2^ - 1 ) > 1 . 548 "* - 1 



( 9 ) 
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Inequalities and (0 together give 



1 < (t- 1) 



+ 1 
t 



( 10 ) 



This inequality contradicts the choice of d and hence our initial assumption must 
be wrong. This establishes the theorem. □ 

Note here that the above theorem can be easily extended to the case where d is 
such that — 1) > (t — 1) 

Theorem 6. Given t> 2 and s > 1, if d is such that <j){2^ — 1) > (t — 1) 
then the probability that a randomly chosen primitive-polynomial of degree d does 

{t-i) 

not divide any t-nomial of degree < d^ is > 1 ^( 2 ^- 1 ) ■ 

Proof. Given t > 2 and s > 1, let d be chosen as above. For this choice of d, 
let n{d,t,s) denote the number of primitive-polynomials f{x) of degree d that 
divide some t-nomial of degree < d®. Then the product of all such polynomials 
f{x) divides 'nt-nomiai{x, d^). Considering the degrees of the respective products, 
it follows that, 

, , , , , , , /d® -I- 

n{d, t,s) d < ft — 1) 



t 



i.e.. 



n(d, t, s) < 



(t-1) 



Since number of primitive-polynomials of degree d is , number of primitive- 

polynomials of degree d that do not divide any t-nomial of degree < d® is > 

— ^ ' If we denote by p{d,tT s) the probability that a randomly 

chosen primitive-polynomial of degree d does not divide any t-nomial of degree 
< d®, then 

(t-i) 



p{d, t,s) > 1 — 



(j){2d - 1 ) 



( 11 ) 



This proves the theorem. 



□ 



Note here that the above lower bound for p{d,t,s) is arbitrarily close to 1 for 
sufficiently large values of d ( irrespective of the choices of s and t). 

For the sake of completion refer to Table ©• The table gives the least degree 

(t— 1) 

d < 128 such that 1 ^ 0-9 lor various s, t pairs. 

For example the probability that a randomly chosen primitive polynomial of 
degree 78 does not divide any trinomial of degree < 78"^ (approx. 2^®) is more 
than 0.9. 
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Table 1. The least degree d < 128 for which 1 



d-i) 

0 ( 2 '*-!) 



> 0.9 
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3 
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- 


4 




78 


109 
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5 




103 
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- 


- 


- 


- 


- 


- 



Theorem 7. There exists primitive-polynomials of degree d which divide a tri- 
nomial of degree 3d and a 4^-nomial of degree < 6d. 

Proof. Let f(x) be a primitive trinomial of odd degree d0. Let a be a root of it. 
So that, 

f{a) = 0 (12) 

Since d is odd (3,2“^ — 1) = 1. Let /3 = mod( 2 ’^-i)^ Note that (3 is primitive 

and that 

(3'^ = a (13) 

Consider the minimal polynomial g(x) of j3. g(x) is primitive and it’s degree is 
d. Equations and m give 



/(/?') = 0 (14) 

Since g(x) is the minimal polynomial of (3, we have that 

f{x^)= 0{modg{x)) (15) 

Thus we have produced a primitive-polynomial g(x) of degree d which divides a 
trinomial /(x^) of degree 3d. Now construct a 4-nomial from /(x^) as follows. 
For the sake of simplicity call the polynomial /(x^) as F(x). Let F(x)= x®-|-x*-|-l 
where (s= 3d) > t. We have from equation ifTHH . 





F{x) = 0 {modg{x)) 


(16) 


So, 




F{x^) = 0 {modg{x)) 


(17) 


Also, 




x'^F(x) = 0 (modg{x)) 


(18) 



^ There possibly exists good number of such primitive trinomials 
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Equations (tlTt and (fTSl) yield 

x^F{x) + F(x^) = 0 {mod g{x)) 

i.e., 



a:®'*'* + x® + + 1 = 0 (modg{x)) 

Note that cc®+* + a;® + + 1 is a 4-nomial of degree < 6d. Thus we have shown 

that a primitive-polynomial g(x) of degree d as chosen above divides a 4-nomial 
of degree < 6d. □ 

For example, if we choose x^'^ -I- -I- 1 as the primitive trinomial f(x) in the 

above given construction we can see that the primitive polynomial x^'^ + x^^ + 
x^^ + + x"^ + x^ + x“^ + x^ + I divides the 4-nomial -|- x^^ + -I- 1. 

It is worth noticing here that the construction used in the above theorem 
is quite general. We could have as well started with a primitive 5-nomial or 7- 
nomial and used any other smaller n th root as appropriate, instead of cubic 
root (of the primitive element a) and derived corresponding results. 

3 Conclusions 

Here we have showed that the degrees of sparse-multiples of a primitive-poly- 
nomial of reasonable degree, in general are sufficiently high. This conclusively 
establishes that sparse-multiples variant of various LFSR attacks are in general 
infeasible requiring very long ciphertexts. 
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Abstract. It is proved that the maximal possible nonlinearity of n- 
variable m-resilient Boolean function is 2”'“^ _ 2 "*+i for m < n — 

2. This value can be achieved only for optimized functions (i. e. functions 
with an algebraic degree n — m — 1). For < m < n — log 2 — 2 
it is suggested a method to construct an n-variable m-resilient function 
with maximal possible nonlinearity — 2"*+^ such that each variable 
presents in ANF of this function in some term of maximal possible length 
n — m — 1. 
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correlation- immunity, resiliency, nonlinearity, algebraic degree, Siegenthaler’s 
Inequality, hardware implementation, pseudorandom generator. 

1 Introduction 

One of the most general types of stream cipher systems is several Linear Feedback 
Shift Registers (LFSRs) combined by nonlinear Boolean function. This function 
must satisfy certain criteria to resist different attacks (in particular, correlation 
attacks suggested by Siegenthaler m and different types of linear attacks). 
Besides this function must have sufficiently simple scheme implementation in 
hardware So, the following factors are considered as important properties of 
Boolean functions for using in stream cipher applications. 

1. Balancedness. A Boolean function must output zeroes and ones with the 
same probabilities. 2. Good correlation-immunity (of order m). The output of 
Boolean function must be statistically independent of combination of any m its 
inputs. A balanced correlation-immune of order m Boolean function is called m- 
resilient. 3. Good nonlinearity. The Boolean function must be at the sufficiently 
large distance from any affine function. 4. High algebraic degree. The degree of 
Algebraic Normal Form (ANF) of Boolean function must be sufficiently large. 
5. High algebraic degree of each individual variable. Each variable of Boolean 
function must appear in ANF of this function in some term of sufficiently large 
length. 6. Simple implementation in hardware. The Boolean function must have 
sufficiently simple scheme implementation. 



B. Roy and E. Okamoto (Eds.): INDOCRYPT 2000, LNCS 1977, pp. IQ HdUl 2000. 
(c) Springer- Verlag Berlin Heidelberg 2000 



20 



Yuriy V. Tarannikov 



There are a lot of papers where only one of these criteria is studied. It was 
found that the nonlinearity of a Boolean function does not exceed 2"“^ — 2^ 

1^. The consideration of pairs of these criteria gave some trade-offs between 
them. So, the Boolean function with maximal possible nonlinearity can not be 
balanced. Another result is Siegenthaler’s Inequality: [13] if the function / is a 
correlation-immune function of order m then deg(/) < n — m, moreover, if / is 
m-resilient, m < n — 2, then deg(/) < n — m — 1. Siegenthaler and other authors 
pointed out that if the Boolean function is ajfine or depends linearly on a big 
number of variables then this function has a simple implementation. But such 
function can not be considered as good for cryptographic applications because 
of another criteria, in particular, algebraic degrees of linear variables are 1. 

The variety of criteria and complicated trade-offs between them caused the 
next approach: to fix one or two parameters and try to optimize others. The most 
general model is when researchers fix the parameters n (number of variables) 
and m (order of correlation-immunity) and try to optimize some other cripto- 
graphically important parameters. Here we can call the works | 12|2 |5|4| 6|7 |8j. 
In [I15I11I16] it was proved (independently) that the nonlinearity of n-variable 
TO-resilient Boolean function does not exceed 2"’“^ — 2™+^. 

The present paper is based on our preprint US]: it continues the investigations 
in this direction and gives new results. In Section 2 we give preliminary concepts, 
notions and some simple lemmas. In Section 3 we give recently established new 
trade-off between resiliency and nonlinearity, namely, that the nonlinearity of 
n-variable m-resilient Boolean function does not exceed 2”“^ — 2™+^. Moreover, 
it appears that this bound can be achieved only if Siegenthaler’s Inequality is 
achieved too. In Section 4 we discuss a concept of a linear variable and introduce 
a new important concept of a pair of quasilinear variables which works in the 
following sections. We discuss the connection of linear and quasilinear depen- 
dence with resiliency and nonlinearity of the function and give a representation 
form for the function with a pair of quasilinear variables. In Section 5 we present 
our main construction method. This method allows to construct recursively the 
functions with good cryptographic properties using the functions with good cryp- 
tographic properties and smaller number of variables. By means of this method 
for < m < n — 2 we construct an m-resilient Boolean function of n variables 
with nonlinearity 2"“^ — 2’”+^, i. e. the function that achieves the upper bound 
for the nonlinearity given in Section 3. The combination of this construction 
with upper bound gives the exact result: the maximal possible nonlinearity of 
n-variable m-resilient Boolean function is 2"“^ — 2’”+^ for < m < n — 2. 

This result was known only for m = n — 2 (trivial), m = n — 3 [H] and some 
small values of n. In Section 6 we strengthen the previous construction and show 
that for < m < n — log 2 — 2 it is possible to construct an n-variable 

m-resilient function with maximal possible nonlinearity 2"“^ — 2™+^ such that 
each variable presents in ANF of this function in some term of maximal possible 
length n — m— 1 (i. e. each individual variable achieves Siegenthaler’s Inequality). 
Note that in m we also discuss how to implement in hardware the functions 
constructed in previous sections. We suggest a concrete hardware scheme for n- 
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variable, m-resilient function, n = 2 (mod 3), m = that achives a maximal 
possible nonlinearity and a maximal possible algebraic degree for each variable 
simultaneously. It is given a scheme of hardware implementation for such func- 
tion. It is remarkably that this scheme has a circuit complexity linear on n. It 
contains 2n — 4 gates EXOR and gates AND. This scheme has a strongly 
regular cascade structure and can be used efficiently in practical design. In this 
paper the section on implementation is omitted because of lack of space. 



2 Preliminary Concepts and Notions 

We consider R", the vector space of n tuples of elements from GF{2). A Boolean 
function is a function from E" to GF{2). The weight wt{f) of a function / on E" 
is the number of vectors a on R” such that f{a) = 1. A function / is said to be 
balanced if wt{f) = wt{f © 1). Obviously, if a function / on E" is balanced then 
wt{f) = 2"“^. A subfunction of the Boolean function / is a function /' obtained 
by substituting some constants for some variables in /. If we substitute in the 
function / the constants ai-^^ , . . . ,ai^ for the variables Xi ^ , . • . , Xi^ respectively 
then the obtained subfunction is denoted by fxil'f.fxi^ ■ If a variable Xi is not 
substituted by a constant then Xi is called a free variable for /'. 

It is well known that a function / on E" can be uniquely represented by 
a polynomial on GF{2) whose degree in each variable is at most 1. Namely, 
/(xi, . . . ,Xn) = © 5(oi) ■ • ■ j where g is also a function 

{ai,...,an)GV^ 

on V^. This polynomial representation of / is called the algebraic normal form 
(briefly, ANF) of the function and each is called a term in ANF of /. 

The algebraic degree of /, denoted by deg(/), is defined as the number of variables 
in the longest term of /. The algebraic degree of variable Xi in /, denoted by 
deg(/, Xi), is the number of variables in the longest term of / that contains Xi. 
If deg(/, Xi) = 0 then the variable Xi is called fictitious for the function /. If 
deg(/, Xi) = I, we say that / depends on Xi linearly. If deg(/, Xi) > 2, we say 
that / depends on Xi nonlinearly. A term of length 1 is called a linear term. If 
deg(/) < I then / is called an affine function. 

The Hamming distance d(ai, CT 2 ) between two vectors cti and a 2 is the number 
of components where vectors and CT 2 differ. For two Boolean functions fi and 
/2 on F", we define the distance between fi and /2 by d(/i,/ 2 ) = #{? S 
r"'|/i(cr) / 2 (ct)}- The minimum distance between / and the set of all affine 

functions is called the nonlinearity of / and denoted by nl{f). 

A Boolean function / on F" is said to be correlation-immune of order m, 
with 1 < TO < n, if the output of / and any to input variables are statistically in- 
dependent. This concept was introduced by Siegenthaler m- In equivalent non- 
probabilistic formulation the Boolean function / is called correlation-immune of 
order to if wt{f) = wt{f) /2’" for any its subfunction /' of n—m variables. A bal- 
anced TOth order correlation immune function is called an m-resilient function. 
In other words the Boolean function / is called TO-resilient if wt{f) = 
for any its subfunction f'oin — m variables. From this point of view we can 



22 



Yuriy V. Tarannikov 



consider formally any balanced Boolean function as 0-resilient (this convention 
is accepted in [niE]) and an arbitrary Boolean function as (— l)-resilient. The 
concept of an m-resilient function was introduced in . 

Siegent haler’s Inequality [13] states that if the function / is a correlation- 
immune function of order m then deg(/) < n — m. Moreover, if / is m-resilient, 
m < n — 2, then deg(/) < n — m — 1. An m-resilient Boolean function / is called 
optimized if deg(/) = n — m — 1 (m < n — 2). 

The next two lemmas are well-known. 

Lemma 2.1 Let f{x \, . . . , Xn) be a Boolean funetion on V^. Then deg(/) = 
n iff wt{f) is odd. 

Lemma 2.2 Let f{xi , . . . , x„) be a Boolean funetion represented in the form 
f{xi, ...,Xn) = 0 {xi 0 CTi) . . . {xi © ai)f{ai © 1, . . . ,(Ti © l,xi+i,. . .,x„). 

Suppose that all 2* subfunetions /(cti©!, . . . , ct;©!, xj+i, . . . ,Xn) are m-resilient. 
Then the funetion f is m-resilient too. 

The Lemma 2.2 was proved in a lot of papers including (for 1 = 1) the pioneer- 
ing paper of Siegenthaler (Theorem 2 in [T3| ) . General case follows immediately 
from the case 1 = 1. 

3 Upper Bound for the Nonlinearity of Resilient 
Functions 

Let n and m be integers, — 1 < m < n. Denote by nlmax(n,m) the maximal 
possible nonlinearity of m-resilient Boolean function on F". It is well-known 
that the nonlinearity of a Boolean function does not exceed 2"“^ — 2^“^ |^. 
Thus, 

nZmax(n, — 1) < 2"“^ — 2““^, (1) 

This value can be achieved only for even n. 

In | I5IIIII6] it was proved (independently) that nlmax{n,m) < 2”“^ — 2’”+^. 
Here we give this result without the proof. 

Theorem 3.1 Let /(xi, . . . , x„) be an m-resilient Boolean function, m < 
n — 2. Then 

nZ(/)<2”-^-2™+\ (2) 

Corollary 3.1 nlmax{n,m) < 2"“^ — 2’”+^ for m < n — 2. 

If m < f — 2 the inequality CD does not give us any new information because 
of well-known inequality (P. But in the following sections we show that the 
inequality 0 is achieved for wide spectrum of large m. 

The next theorem is proved in m- 

Theorem 3.2 Let /(xi,...,x„) be an m-resilient nonoptimized Boolean 
funetion, m < n — 3. Then nl{f) < 2"“^ — 2™+^. 

Corollary 3.2 The inequality P can be achieved only for optimized func- 
tions. 

Thus, the inequality P can be achieved only if Siegenthaler ’s Inequality is 
achieved too. 
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4 On Linear and Quasilinear Variables 

Recall that a variable Xi is called linear for a function / = f{xi, . . . Xi, 

Xi+i, . . . ,Xn) if deg{f,Xi) = 1 . Also we say that a function / depends on a 
variable Xi linearly. If a variable Xi is linear for a function / we can represent / 
in the form 

f (^1 5 ■ • ■ 5 1 ) ; ^2 + 1 ) • ■ • 5 ^n) • 5 1 ; ^2 + 1 ) • ■ • ) ^2 ■ 

Another equivalent definition ^f a linear variable is that a variable Xi is linear 
for a function / if f{Si) yf /(<^ 2 ) for any two vectors and ^2 that differ only 
in Ah component. By analogy with the last definition we give a new definition 
for a pair of quasilinear variables. 

Definition 4.1 We say that a Boolean function / = f{x\, . . . ,Xn) depends 
on a pair of its variables (xi,Xj) quasilinearly if f{Si) /(^ 2 ) for any two 
vectors and 62 of length n that differ only in ith and jth components. A pair 
(xi,Xj) in this case is called a pair of quasilinear variables in f. 

Lemma 4.1 Let f{xi, . . . ,Xn) be a Boolean function. Then (xi,Xj), i < j, 
is a pair of quasilinear variables in f iff f can be represented in the form 

f{xi, ...,Xn)= g{xi, . . .,Xi-i,Xi+i, . . .,Xj-i,Xj+i, ...,Xn,Xi® Xj) 0 Xi. (3) 



Proof. If / is represented in the form m then, obviously, a pair (xi,xj) is 
quasilinear in /. Suppose that a pair (xi,xj) is quasilinear in /. The function / 
can be written in the form / = giXiXj 0 g2Xi 0 g^Xj 0 54 = h{xi,Xj) where gk 
are functions of the remaining variables. We have /i(0,0) /i(l,l) and /i(0, 1) 

h{l, 0). So 51 0 32 0 53 0 and 33 7^ 52 - Thus 31 ® 52 © ffa = 1 and 32 0 33 = 1, 

so 31 = 0. Also 32 = 33 © 1- Therefore / = (33 0 l)xi 0 g^Xj 0 34 = {gffxi 0 
Xj) © 34) © Xi, as desired. □ 

Lemma 4.2 Let /(xi, . . . , x„) be a Boolean function. Lf f depends on some 
variable Xi linearly then f is balanced. 

Proof. Combine all 2" vectors of the function / into pairs so that any pair 
((71,0^2) contains vectors a\ and CT2 that differ in ith component and coincide in 
all other components. Then /(cti) f{^2)- So, wt{f) = 2”“^ and / is balanced. 

□ 

Corollary 4.1 Let /(xi,...,x„) be a Boolean function. Lf f depends on 
some variables x^j, Xi.^, . . . , Xi^ linearly then f is {s — l)-resilient. 

Note that the Corollary 4.1 agrees with our assumption that a balanced 
function is 0-resilient, and an arbitrary Boolean function is (— l)-resilient. (In 
the last case s = 0.) 

Lemma 4.3 Let /(xi, . . . ,x„) be a Boolean function. Lf f depends on some 
pair of variables {xi , Xj ) quasilinearly then f is balanced. 

Proof. Combine all 2" vectors of the function 3 into pairs so that any pair 
((71,0^2) contains vectors cti and CT2 that differ in ith and jth components and 
coincide in all other components. Then f(ffi) So, the function / is 

balanced. □ 
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The next two lemmas can be proved easily. 

Lemma 4.4 Let f{x\, ... ,Xn,Xn+i) = g(xi, . . . ,x„) (B cx„+i where c G 
{0, 1}. Then nl{f) = 2nl{g). 

Lemma 4.5 Let f{xi, . . . ,x„) be a Boolean function on V" and / depends 
on some pair of variables (xi,Xj) quasilinearly. Then nl{f) = 2nl{g) where g is 
a function used in the representation of f in the form m in Lemma 4-L 



5 A Method of Constructing 



Theorem 3.1 shows that the nonlinearity of m-resilient Boolean function on 
can not exceed 2"’“^ — 2"*+^. Earlier in papers ii-T-ii.n the authors developed 
methods for the constructing of m-resilient Boolean functions of n variables with 
high nonlinearity, and, in particular, the nonlinearity 2"“^ — 2™+^ in these four 
papers can be achieved for m -I- 3 > 2"“™“^. The methods suggested in these 
papers are quite different but in the part of spectrum given by the inequal- 
ity m -I- 3 > 2"“"*“^ these methods give really the same construction. Com- 
bination of these results with our upper bound © from Theorem 3.1 proves 
that nlmax{n,m) = 2"“^ — 2™+^ for m -I- 3 > 2"“"*“^. In this section we 
prove a stronger result, namely, we prove that nlmax{n,m) = 2"“^ — 2™“''^ for 
<m<n-2. 

Lemma 5.1 Let n be a positive integer. Let fi{xi, . . . ,x„) and f 2 {yi, ■ • ■ , 
yn) be m-resilient Boolean functions on E” such that nl{fi) > Nq, nl{f 2 ) > 
Nq. Moreover, there exist two variables Xi and Xj such that fi depends on the 
variables Xi and xj linearly, and /2 depends on a pair of the variables (xi,Xj) 
quasilinearly. Then the function 

f[(xi, . . .,Xn,Xn+l) = {Xn+l © l)/l(a;i, . . . , X„) © X„+i/ 2 (xi, . ..,X„) (4) 

is an m-resilient Boolean function on with nonlinearity nl{fi) > 2"“^+iVo, 

and the function 

12(^1 ^n+2) — (^n-t-l © ^n +2 © ^ ^n)© 

(x„+i © X„+2)/2(xi, . . . ,X„) © X„+i ^ 



is an (m -hi) -resilient Boolean function on 1/"+^ with nonlinearity nl{f 2 ) > 2" + 
2Nq. Moreover, /2 depends on a pair of the variables (x„+i,x„+ 2 ) quasilinearly. 

Proof. At first, consider the equation (HD- Both subfunctions {fi)x^^.i = 
/i(xi, . . . ,Xn) and = f 2 (xi, . . . ,x„) are m-resilient, hence by Lemma 

n+1 

2.2 f[ is m-resilient too. Let I = ^ CiXi 0 cq be an arbitrary affine function on 

2=1 



Then d{f[,l) = d{h,ll^^J + d{h, l^J = wt{h(Bll^J + wt{h(Bll^J. 

and /2 © ll , , is balanced. 



We state that at least one of two functions fi © l^ 

Indeed, if Cj = 0 or Cj = 0 then the function fi © 
early, hence, by Lemma 4.2 the function fi © 
case Ci = 1 and cj = 1 it is easy to see from the representation (|3|) that the func- 
tion /2 © depends on a pair of the variables {xi,xj) quasilinearly, therefore 



x„+i depends on Xi or Xj lin- 
is balanced. In the remained 
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by Lemma 4.3 the function /2 © is balanced. Thus, d{f[,l) > 2"“^ + iVg. 
An affine function I was chosen arbitrary, therefore, nl{f[) > + Nq. 

Next, consider the equation By conctruction m and representation ® 
we see that depends on a pair of the variables a;„+ 2 ) quasilinearly. Now 

we want to prove that the function is (m + l)-resilient. Substitute arbitrary 
m + 1 variables by constants generating the subfunction /. If both variables 
Xn+i and cc „+2 are free in / then / depends on a pair (x„+i, a:„+ 2 ) quasilinearly, 
therefore by Lemma 4.3 the function / is balanced. If at least one of two variables 
Xn+i and Xn +2 was substituted by constant then we substituted by constants 
at most m of first n variables xi, . . . , x„. But the functions x ^+2 ~ 

fL+iX +2 = /2. fL+iX +2 = /2 ® fL+i.x ^+2 = /i ® 1 are m-resilient, thus, 

by Lemma 2.2 the function / is balanced. A subfunction / was chosen arbitrary. 
So, the function is (m + l)-resilient. 

Finally, we need to prove the lower bound for the nonlinearity of f^- Let 

n+2 

I = 0 CiX- 

i=l 



Co be an arbitrary affine function on I/"+2. Then d{f 2 ,l) = 



+ l,°„ + 2) + '^(/2, + + +'^(/2® 1, ^s„ + iX+2) ® ^L + 1,X„ + 2) “ 

0 ^ _L ;0 1 ^ as ll 0 



i + l,X„ + 2 



) + Wt{f2 © II 



i + l,X„ + 2 



) + Wt{f2 



.i + l,X„ + 2 



1) + wt{fi 



d{fi 

wt{fi © 1 ° 

^x„+i x „+2 ® !)■ By the same reason as it was given above at least one of two 
functions /i © ° and /2 © 



is balanced, and at least one of two 
functions /2 © x „+2 ® ^ and fi © © 1 is balanced. Thus, d{f 2 ,l) > 

2" + 2A'o. An affine function I was chosen arbitrary, therefore, nl{f 2 ) > 2’^ + 2Nq. 



"a;„+i,x„+2 



□ 

Lemma 5.2 Suppose that there exist an m-resilient Boolean function fn,i 
on F", nl{fn,i) > Nq, and (m + l)-resilient Boolean function fn+ 1,2 on , 
nZ(/„+i_ 2 ) > ‘2Nq, besides the function fn+ 1,2 depends on some pair of its vari- 
ables (xi,Xj) quasilinearly. Then there exist an {m-\-2) -resilient Boolean function 
fn+ 3,1 on n/(/„+ 3 _i) > 2"+^+4A'o, and {m-\- 3) -resilient Boolean function 

fn+ 4,2 on nl{fn+ 4 , 2 ) > + 8Nq, besides the function fn+ 4,2 depends 

on some pair of its variables quasilinearly. 

Proof. We can assume that i < j. Denote 



fl(,Xl, . . . , Xy,^2) fn,l(,X2_, . . . , Xi— i , Xj^i , ... , Xj — i , Xj_|_i , ... , X7^_|_2 ) © Xj © Xj , 

, Xn+2^ — fn+l, 2 {x±, . • . , Xn+l) © Xn+2- 

By Lemmas 4.2 and 4.4 the functions f\ and /2 are (m + 2)-resilient functions 
on nl{fi) > 47Vo, nl{f 2 ) > ^Nq. Moreover, fi depends on the variables Xi 

and Xj linearly, and /2 depends on a pair of the variables (x^ , Xj ) quasilinearly. 
Substituting fi and /2 to m and © (we shift n —>■ n -\-2) we have 

/((xi, . . . ,X„,X„+3) = (x„+3 © l)/l(xi, . . . ,X„+ 2 ) © X„+3/2(xi, . . . , X„+ 2 ) 



and 



^2 (^1 ’ • ■ • 5 : ^n+ 4 ) — (^n+3 ® ^n+4 0 fl (^ 1 5 ■ • ■ ; ^n+2)0 
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(^n+3 ® ^n+ 4 ) f2{^l j ■ • ■ ; ^n+ 2 ) © ^n+3- 

By Lemma 5.1 we have constructed an (m+2)-resilient Boolean function fn+ 3,1 = 
f[ on nZ(/„+ 3 _i) > 2"+^ +4Aio, and an (m + 3)-resilient Boolean function 

/n+ 4,2 = /2 on nl{fn+ 4 , 2 ) > 2"+^ + SA^o, besides the function /n+ 4,2 de- 
pends on a pair of its variables (xn+ 3 ,Xn+ 4 ) quasilinearly. □ 

Corollary 5.1 Suppose that for m < n — 2 there exist an m-resilient Boolean 
function fn,i on V", nl(fnp) = 2"“^ — 2™+^, and {m + l)-resilient Boolean 
function fn+ 1,2 on nl{fn+i, 2 ) = 2" — 2™+^, besides the function fn+ 1,2 

depends on some pair of its variables (xi,Xj) quasilinearly. Then there exist an 
(m -I- 2)-resilient Boolean function fn+ 3,1 on 1/"+^^ nl{fn+ 3 ,i) = 2"+^ — 2™+^, 
and {m+5) -resilient Boolean function f n+ 4,2 on 1/"+'*^ ril{fn+ 4 , 2 ) = 2"+^— 2’”+'*, 
besides the function f n+ 4,2 depends on some pair of its variables quasilinearly. 

Proof. The hypothesis of Corollary 5.1 is the hypothesis of Lemma 5.2 for 
Nq = 2"“^ — 2™+^. By Lemma 5.2 we can construct the functions fn+ 3,1 and 
fn +4 with required properties and nonlinearities n/(/„+ 3 _i) > 2"+^ -|- 4 Aq = 
2"+2 - 2’"+3, nl{f n+ 4 , 2 ) > -k 8 N 0 = 2^+^ - 2™+4. By Theorem 3.1 the 
right parts of the last inequalities are also upper bounds. So, we have equalities 
nl{fn+3,l) = 2”+2 - 2™+3, nl{fn+4,2) = 2"+3 - 2™+f □ 

Theorem 5.1 nlmax{n,m) = 2"“^ — 2™+^ for < m < n — 2. 

Proof. 11 m = n — 2 then by Siegenthaler’s Inequality any (m — 2)-resilient 
function on F" is affine. So, nlmax{n, n — 2) = 0. Next, take f 2 ,i = X 1 X 2 , fs ,2 = 
Xi{x 2 © X 3 ) © X 2 . These functions satisfy to the hypothesis of Corollary 5.1 with 
n = 2, m = —1. By Corollary 5.1 we construct the functions f^+ and /e ,2 such 
that the function /s^i is an 1-resilient Boolean function on V^, nl{f^^i) = 2^ — 2^, 
the function /e ^2 is a 2-resilient Boolean function on V^, n^(/ 6 , 2 ) = 2^ — 2^, 
besides /g ,2 depends on a pair of the variables {x 5 ,xe) quasilinearly. Substitute 
the functions f^+ and /e ,2 to the hypothesis of Corollary 5.1, and so on. By this 
way, for each integer fc, fc > 3, we construst an m-resilient Boolean function fn,i 
on y" with nonlinearity 2"“^ — 2™+^ where n = 3/c — 7, m = 2fc — 7. Let < 

n 

m < n-8. Put f{xi,...,Xn) = f3{n-m)-7,l(xi,...,X3(^n-m)-7) 0 Xi. 

i—3{n—m) — Q 

By the hypothesis of Theorem 5.1 we have 3(n — m) — 7<n. The resiliency of 
the function / is (2(n — m) — 7) + (n — (3(n — m) — 7)) = m, the nonlinearity of 
the function / is 2”-(3("-™)-7) i^{3{n-m)-7)-l _ 2(2(n-m)-7)-Hl^ ^ 2"-1_2’"+1. 
Thus, for < m < n — 2 we have constructed an m-resilient Boolean function 
on y" with nonlinearity 2"“^ — 2™+^. Taking into account the upper bound (|2|) 
from Theorem 3.1 we complete the proof. □ 

Note that a recent conjecture nlmax{n,n — 4) = 2"“^ — 2"“^ (for n > 5) in 
jS] is a special case of our Theorem 5.1. 

6 Optimization of Siegenthaler’s Inequality for Each 
Individual Variable 

Some lack of the construction given in the proof of Theorem 5.1 is that for 
< m the constructed function depends on some variables linearly. Note 
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that the functions with the nonlinearity 2"“^ — 2"^+^ constructed in | |12I2IBI7| 
(for m + 3 > depends nonlinear ly on all its variables only in some cases 

when TO + 3 = or to + 2 = 2"“™“^. In general, those functions depends 

nonlinear ly on 2"“™“^ + n — to — 4 or 2"“'"“^ + n — to — 3 variables. In this 
section for < m < n — log 2 — 2 we suggest a method to construct an 

TO-resilient Boolean function on that achieves Siegenthaler’s Inequality for 
each its individual variable (i. e. deg(/, 0 :^) = n — to — 1 for all variables Xi). 
Simultaneously we give a more general way of constructing than it was done in 
previous section. 

We say that a variable Xi is a covering for a function / if each other variable 
of / is contained together with Xi in some term of maximal length in ANF of /. 
We say that a quasilinear pair of variables {xi , Xj ) is a covering for a function / 
if each other variable of / is contained together with Xi in some term of maximal 
length in ANF of / (and consequently together with Xj in some term of maximal 
length in ANF of /). 

Lemma 6.1 For integers k and n provided k>3, 3k — 7 <n<3 - — 2, 

there exists a Boolean function f!f ^ on V" satisfying to the next properties: 

(1 i) fn 1 is {n — k) -resilient; 

(1 

(1 Hi) deg(/^ a;i) = fc — 1 for each variable Xi; 

(1 iv) f!f i has a eovering variable. 

For integers k and n provided k>3, 3k — 7<n<3 - 2^“^ — 2, there exists 
a Boolean function /* 2 on satisfied to the next properties: 

(2 i) f)l 2 on [n — k) -resilient; 

(2 a) nilfl2) = 2”“^ - 2"-'=+!; 

(2 Hi) deg(/* 2 )^i) = ^ — 1 for each variable Xi; 

(2 iv) /*2 has a quasilinear pair of eovering variables. 

Proof. The proof is by induction on k. For k = 3 we can take /| 1 = 2 : 1 X 2 , 
/l,i = /I.2 = a:i(x2 © X3) © X2, /|_2 = (2^1 © 2:2 )(x3 © Xi ) © xi © X3. It is easy to 
check that these functions satisfy to all required conditions. 

Suppose that the statement is valid for k. We want to prove it for fc + 1. We 
search the functions f!f\^ and 2 ^ in the form 



and 



= (2 ^n©l) (/,?i( 2 :i,...,X„J 0 X, 

V i=m+l 

/ n — 1— «2 

©2^n I 2Ji © 2 {^n—n2 7 • ■ • 7 2:y,_ i) 

V i=l 

ni + n 2 > n — 1, ni < n — 3, ri 2 < n — 2, 



(6) 



n-2 



fn,2 ~ i ^ n — 1 0 0 1) ( fm (^1 ; ■ • ■ ; 

\ i—ni+1 

/n— 2— 712 \ 

^ n — 1 0 ^n) f 0 y*n2,2 (^71— tt.2 — 1 ’ * ■ * 5 ^ti— ® ^ti— 1 1 

rii U 2 > n — 2^ ni < n — 4, n 2 < n — 3, 



( 7 ) 
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where . . . , is ^(xi, . . . , x„ J or (if fn, = fnu2 

then ri2 n — 2 in m and ri2 n — 3 in 0). Besides we suppose that a 
covering variable in is xi (or a quasilinear pair of covering variables in 2 
is (xi, X2)), and we suppose that a quasilinear pair of covering variables in 2 
is (xn_2,x„_i) in I® or (x„_3,x„_2) in ®. 

The functions and 2^ satisfy to all required properties. Indeed: 

n— 1 

(1 i) The resiliency of the function /^fyxi, . . . ,x„J 0 Xi is (ni — k) + 

n— 1— ri2 

(n— 1 — ni) = n— fc— 1, the resiliency of the function 0 Xi 0 2i^n-n2, 

i —1 

. . . , Xn-i) is n — 1 — U2 + (u2 — k) = n — k — 1. So, by Lemma 5.1 the resiliency 
of the function is n — (fc 0 1). 

n— 2 

(2 i) The resiliency of the function /,^fyxi, . . . ,x„J 0 Xi is (ni — k) + 

n— 2— ri2 

(n — 2 — m) = n — fc — 2, the resiliency of the function 0 Xi(B fn2,2i^n-n2-i, 

i —1 

. . . , Xn-2) is n — 2 — U2 + (n-2 — k) = n — k — 2. So, by Lemma 5.1 the resiliency 
of the function is n — (fc 0 1). 

n— 1 

(1 ii) The nonlinearity of the function /^fyxi, . . . ,x„J 0 Xi is ( 2 "i“^ — 

n— l—ri2 

2"i-^+i)2"-^-"i = 2"“^ — 2"“^, the nonlinearity of the function 0 Xi 0 

i=l 

/42(a;«-n2, is = 2 "- 2 - 2 "-^ The function 

n— 1 

fn^ (X1,...,X ni ) © Xi depends on variables Xn-2 and x„_i linearly whereas 

i—n\-\-l 
n— 1— ri2 

the function 0 Xj 0 2{^n-n2j ■ ■ ■ j Xn-i) depends on a pair of variables 

2 — 1 

(xn_2,x„_i) quasilinear ly. So, by Lemma 5.1 the nonlinearity of the function 

^fc+1 jg 2 ^n —2 _j_ ^ 2 ^n —2 Q^n—k'j Q^n—l Q^n— 

n— 2 

(2 ii) The nonlinearity of the function /*^(xi, . . . ,x„J 0 Xi is ( 2 ”i“^ — 

2=m +1 

n— 2— ri2 

2"i-^+i)2"“^-"i = 2"“^ — 2’^“*“^, the nonlinearity of the function 0 Xi 0 

i=l 

fn2,2{Xn-n2-l, • ■ ■ , ^n-2) is equal tO 2— 2 — 2 (2"2 - 1 _ 2^2- fc+ 1 ) = 2 — 3 - 2 "-'=-!. 

n—2 

The function (xi, . . . , x^fy 0 Xi depends on variables x„_3 and Xn-2 lin- 

i—n\-\-l 
n— 1— n2 

early whereas the function 0 Xj 0 2{^n-n2 > • • • > 2:^-1) depends on a pair 

i —1 

of variables (x„_3, Xn-2) quasilinear ly. So, by Lemma 5.1 the nonlinearity of the 
function is 2"-^ 0 2 ( 2—3 _ 2^-*-!) ^ 2— ^ - 2"-('=+i)+i. 

(1 iii), (1 iv) Each variable from the set {x2, X3, . . . , x„fy is contained together 
with xi in some term of length /c — 1 in ANF of the function i(xi , . . . , x„fy if 
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/ni = fni 1 variable from the set { 2 : 3 , X 4 ,. . . , } is contained together 

with x\ in some term of length k —1 (and also together with X 2 in some term of 
this length) in ANF of the function • ■ • > if fn^ = fni ,2 - The function 

n— l—ri 2 

® Xi(Bfn 2 2 (^n-n 2 : • ■ ■ j Xn-i) depends on the variable x\ linearly (and also 

on the variable X 2 if fn^ = fni, 2 )- So, after the removing of the parentheses and 
the reducing of similar terms each variable from the set {x\,X 2 , X 3 , . . . , x^} will 
be contained together with Xn in some term of length k in ANF of the function 
Analogously, each variable from the set . . . , a:„_ 3 } is contained 

together with Xn -2 in some term of length k — 1 (and also together with Xn-i 
in some term of such length) in ANF of the function 2 i^n-n 2 > ■ ■ ■ 

n— 1 

The function /,^^(xi, . . . ,Xm) 0 Xi depends on the variables Xn -2 and Xn-i 

2 =;ni + l 

linearly. So, after the removing of the parentheses and the reducing of similar 
terms each variable from the set • ■ • iXn-i\ will be contained together 

with Xn in some term of length k in ANF of the function fn~i^- By condi- 
tion rii -I- ri 2 < n — 1, therefore the union of the sets {x\,X 2 ,x^, . . . ,Xm} and 
{xn-n 2 i ■ ■ ■ jXn-i} is the Set {xi, . . . ,Xn-i}- Thus, Xn is a covering variable in 
The proof of properties (2 iii) and (2 iv) is analogous. 

Finally, we note that according to (0 we can construct the function i 
if n > rii -I- 3 > (3/c — 7) -I- 3 = 3(fc -I- 1) — 7 and if n < rii -|- ri 2 -I- 1 < 

2(3 • 2^“^ — 2) -I- 1 < 3 • — 3, and according to d7]) we can construct the 

function /* 2 T n > ni -|- 4 > (3fc — 7)-|-4 = 3(fc-|-l)— 4 and if n < rii -|- ri 2 -I- 2 < 

2(3 • 2^“^ — 2) -f 2 < 3 • — 2. So, the step of induction is proven. □ 

Theorem 6.1 For integers m and n provided < m < n — log 2 — 2, 

there exists an m-resilient Boolean function on V'^ with nonlinearity 2"“^ — 2™+^ 
that achieves Siegenthaler’s Inequality for each individual variable. 

Proof. Straightforword corollary from Lemma 6.1. □ 
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Abstract. This paper presents a new attack called Decimation Attack 
of most stream ciphers. It exploits the property that multiple clocking 
(or equivalently d-th decimation) of a LFSR can simulate the behavior 
of many other LFSRs of possible shorter length. It yields then significant 
improvements of all the previous known correlation and fast correlation 
attacks. A new criterion on the length of the polynomial is then defined 
to resist to the decimation attack. Simulation results and complexity 
comparison are detailed for ciphertext only attacks. 



Keywords: stream cipher, linear feedback shift register, correlation attack, fast 
correlation attack, sequence decimation, multiple clocking. 

1 Introduction 

Despite growing importance of block ciphers, stream ciphers remain a very im- 
portant class of cipher systems mainly used by governmental world. 

In a binary additive stream cipher, the ciphertext is obtained by bitwise 
addition of the plaintext to a pseudo-random sequence called the running key. 
This latter is produced by a pseudo-random generator whose initial state consti- 
tutes the secret key. Most real-life designs center around Linear Feedback Shift- 
Register (LFSR) combined by a nonlinear Boolean function. Different variant 
exist: clock-controlled systems, filter generators, multiplexed systems, memory 
combiners, decimated generators,... This paper will focus on the most common 
class of combination generators depicted in Figure [H 

The cryptanalyst’s problem often deals with that of recovering the initial 
states of some LFSRs, assuming that the structure of the generator is known to 
him. State of the art in generic stream ciphers cryptanalysis can be summarized 
as follows: correlation and fast correlation attacks. Both exploit an existing cor- 
relation (of order k) between the running key a and some linear combination 
of the k input variables ^ , . . . , Xi^ . A so-called divide and conquer attack 
is conducted which consists to try to recover the initial state of the k target 
LFSRs independently of the other unknown key bits. Such correlations always 
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Fig. 1. Nonlinear combination generator 



exist but functions offering a good cryptographic resistance generally offer a high 
correlation order k thus imposing on the cryptanalyst to consider simultaneously 
several LFSRs [7j. In this case it is obvious that the k distinct LFSRs can be 
seen as a single LFSR of length L; the length and the feedback polynomial P 
of this LFSR can be derived from the feedback polynomials of the constituent 
polynomials m- In the following we will generalize by speaking of the single 
target LFSR of length L. A single LFSR will then be a particular case for k = 1. 

— In correlation attacks [18119] , the 2^ — 1 possible initializations of the LFSR 
are exhaustively tried and each time its corresponding produced sequence x 
is compared to the captured running key a. The initialization yielding the 
closest awaited statistical bias is supposed to be the correct one. This attack 
is limited by the length L of the target LFSR (by now L « 50) since it 
requires 2^ — 1 trials and practical attacks can deal with only correlation 
order k = 1 (i.e. considering only one LFSR). Moreover the longer the LFSR 
is and the lower the correlation value, the longer the necessary running key 
sequence will be. This attack is nowadays no longer efficient except for weak 
schemes. 

— Fast correlation attacks were introduced by Meier and Staffelbach m and 
avoid examining all possible initializations of the LFSR. The output of the 
target LFSR is considered to have passed through a noisy channel, most 
frequently modelled by the Binary (Memoryless) Symmetric Channel, BSC 
with some error probability p < ^, where e = ^ — p is usually very small. In 
this setting, an LFSR output sequence of fixed length N can be considered as 
a [N, L] binary linear code. Each of its codewords can be uniquely related to a 
possible initialization. Then the cryptanalyst’s problem becomes a decoding 
problem in the presence of a BSC with strong noise. Meier and Staffelbach at- 
tack uses iterative decoding process for low-density parity-check codes when 
feedback polynomial is of low- weight. Minor improvements have then been 
obtained unms!. Johansson and Jonsson recently presented new powerful 
ideas by considering convolutional codes [2] and turbo-codes HU]. Canteaut 
and Trabbia in [U] show that Gallager iterative decoding [S] with parity-check 
equations of weight 4 and 5 is usually more efficient than these previous at- 
tacks since it successfully decodes very high error probabilities with a rela- 
tively feasible time and memory complexity. Finally, Chepyzhov, Johansson 
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and Smeets [5] recently present a new, powerful fast correlation attack allow- 
ing significant reduction of memory requirements and complexity. The key 
idea is to use a code of dimension K < L to improve decoding complexity. 

All these attacks have the length L of the target LFSR as major limitation (par- 
ticularly when the correlation order k is high) . As soon as L increases too much 
and e is too small, the memory and complexity requirements explode, making 
these attacks unfeasible. This paper presents how to possibly bypass this limi- 
tation. By considering d-fold clocking (or d-th decimation) of the target LFSR 
output sequence, we show how to use a simulated shorter LFSR thus improv- 
ing the known attacks. This approach is particularly efficient when dealing with 
long LFSRs or with combining functions of high correlation order of real-life de- 
signs. With a relatively longer sequence we suppress the memory requirements 
and significantly reduce the complexity of the Canteaut/Trabbia attack ED and 
obtain a slightly better complexity than Chepyzhov, Johansson and Smeets at- 
tack. Moreover feasibility of this attack is determined only by the LFSR length 
and not the feedback polynomial, that however must be known (directly or after 
reconstruction m- We will only consider ciphertext only attack to make this 
attack more realistic. 

This paper is organized as follows. Section [^presents the theoretical tools we 
use in this attack. In Section |2] the Decimation Attack (DA) itself is described 
and simulation results for different cases are given. A new criterion is then defined 
allowing to choose LFSRs resisting this attack. Section |4] compares decimation 
attack with the two best known attacks of EE]- 

2 Decimation of LFSR Sequences 

A linear feedback shift register of length L is characterized by L binary connec- 
tion coefficients (pi)i<i<L- R associates to any L-bit initialization {st)i<t<L a 
sequence (sj)t>o defined by the L-th. order linear recurrence relation 

L 

^t+L — ^ t ^ 0 . 

i=l 

The connection coefficients are usually represented by a univariate polynomial P 
over F 2 , called the feedback polynomial: 



L 

P{X) = 1 + J2P^X^ ■ 

Most applications use a primitive feedback polynomial to ensure that the periods 
of all sequences produced by the LFSR are maximal. 

Let us consider a sequence a = cti,(T 2 , ■ • ■ , produced by a LFSR of length 
L whose feedback polynomial is irreducible in GF{q)[X], Suppose now that we 
operate a sampling on a at intervals of d clock cycles (d-fold clocking) thus 
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producing a subsequence p = pi,p 2 ,---- In other words, it is equivalent to the 
d-decimation of the original sequence a. Thus we have pi = adj for j = 0,1,2, .. . 
or in sequence notation p = a[d]. 

With d-fold clocking, the original LFSR will behave like a different LFSR 
which is called the simulated LFSR | 16| . The interesting question is to determine 
its properties, especially relatively to the original LFSR. They are summarized 
in the following proposition. 

Proposition 1 j'f dl page 14-6] Let a be the sequence produced by the original 
LFSR whose feedback polynomial P{x) is irreducible in GF{q) of degree L. Let 
a be a root of P{x) and let T be the period of P{x). Let p the sequence resulting 
from the d-th decimation of a, i.e. p = a[d\. Then the simulated LFSR, that is 
the LFSR directly producing p has the following properties: 

1. The feedback polynomial P*{x) of the simulated LFSR is the minimum poly- 
nomial of a'^ in GF{q^). 

2. The period T* of P*{x) is equal to ■ 

3. The degree L* of P* (x) is equal to the multiplicative order of q in . 
Moreover all d in Ck where Gk = {k, kq, kq ^, . . . , } mod T denotes the cyclotomic 
coset of k modulo T, result in the same simulated LFSR, except for different 
initial contents. Finally, every sequence producible by the simulated LFSR is 
equal to a[d\ for some choice of the initial contents of the original LFSR. 

In real case applications, P{x) is primitive so T =2^ — 1. Thus when T is not 
prime, by a careful choice of d such that gcd(2^ — l,d) 1 one may expect to 
obtain a simulated LFSR shorter than the original one. 

Example 1 Let us consider the original LFSR with P{x) = -|- 1. 

— d S C3 = {3, 6, 9, 12}, T* = 5 and L* = 4 since the multiplicative order of 
2 in Z5 is 4 - 

— d G C 5 = {5, 10}, T* = 3 and L* = 2 since the multiplicative order of 2 in 
Z 3 is 2 ( and P* (x) = X -\-l). 

The feedback polynomial P*{x) of the simulated LFSR can be obtained by 
applying the Berlekamp-Massey LFSR synthesis algorithm m to the sequence 
P- 

Finally, we recall that a combination generator is vulnerable to correlation 
attacks m if the output of the combining function statistically depends on one 
of its inputs. More generally, Siegenthaler m introduced the following criterion: 

Definition 1 A Boolean function is t-th order correlation-immune (denoted 
CL(t)) if the probability distribution of its output is unaltered when any t in- 
put variables are fixed. 

This property equivalently asserts that the output of / is statistically indepen- 
dent of any linear combination of t input variables. If / is Gift) then / if GI{k) 
for any k < t as well. Then the correlation-immunity order of a function / is the 
highest integer t such that / is Gift). Equivalently / is said {t-\- l)-th order cor- 
related. Practically speaking, if / is Gift), then to exploit an (always) existing 
correlation, we have to consider at least t-\- 1 LFSRs at the same time [7|. 
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3 The Decimation Attack 

3.1 Description of the Algorithm 

To deal with a more realistic attack, we will consider only ciphertext attack. So 
this will limit the drawback of ciphertext length increasing. Then the BSC with 
error probability p will model at the same time the Boolean function correlation 
P[xi = cr] = and the bit probability of plaintext po = P[mt = 0]. We take po = 
0.65 which corresponds to most real-life cases. Then we have p = po+p]^ — 2pQpl^. 

Let be a target LFSR of length L such that there exists d\2^ — 1 satisfying 
Proposition [T] In all these cases (see Section 13.. 3D the best d yields a simulated 
LFSR of length at most 

With the decimated ciphertext sequence p = a[d] we try to recover the 
corresponding output sequence of the simulated LFSR which is the decimated 
sequence Xi[d\ of the (original) target LFSR output sequence Xi. It simply con- 
sists of a Siegenthaler attack na on this simulated LFSR (see Figure[T|). It then 
needs only 2^ exhaustive searches. 

Any kept L*-bit candidate is used to generate L bits of the decimated se- 
quence Xi[d\. Since each bit of sequence Xi[d\ is a bit of sequence Xi as well, it 
can be described by two different equations. One has L* variables (the bit is 
seen as a bit of sequence Xi[d\). The other has L variables (the bit is seen as a 
bit of sequence Xi). Example [2] illustrates that. 

The system of equations in L variables has obviously rank L* . Then by taking 
L* principal variables, we can express them, depending on the L* other variables 
taken as parameters. An additional L*-bit exhaustive search on these parameters 
allows to retrieve the correct L-bit initialization. 

Note that in all our experiments, the correct L*-bit initialization of the sim- 
ulated LFSR was always detected within the very few first best estimator values 
(see farther), thus insuring detection and limiting cost of additional exhaus- 
tive search. However, to prevent a non such optimal detection, first step of L*- 
bit exhaustive search is conducted on a few other shifted decimated sequences 
(To [d] , CTi [d] , . . . , (Tfc [(f] where ai [(f] means that we decimate a starting from in- 
dex position i {Q < i < d). Sorting the results always allowed to detect the 
correct L*-bit initialization (we additionnally can test the very few kept L-bit 
initializations on the sequence a) . Good experimental values are 2 < k < L* . 

We compute for each of the 2^ possible initializations the value of estimator 

t 

For the correct L*-bit initialization, E has Gaussian distribution with mean value 
N{1 — p) and variance cr^ = Np{l — p) {Hi hypothesis) whilst for all the wrong 
ones E has Gaussian distribution with mean value ^ and variance ^ (^o) 
where N is the minimum number of required ciphertext bits of a[(l\. 

A decision threshold T will discriminate Ho from Hi- If \E\ > T then Hi 
is accepted and the initialization is kept otherwise Ho is chosen and the ini- 
tialization is rejected. The minimum number N of required ciphertext bits of 
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cr[d] depends on the number of wrong decisions that we accept. This num- 
ber (as well as the threshold T) is determined by the false alarm probability 
{pfa = P[E > T\Hf\) and the non detection probability {pnd = P[E < T\Hi\). 
If (p denotes the normal distribution function 

then we have 



pfa = 1 — ^ 




pnd = <p\h = 



T-N{l-p) \ 
\/N{l-p)p j 



Finally we obtain 



N = 



( 26a/p( 1 -p) - a 




(aVfV-h 



( 1 ) 



In terms of ciphertext we then need N ■ d bits. Global complexity of the attack 
is then in 0{N ■ 2^ with no memory requirements. 



3.2 Simulation Results 



Now let us present simulation results of our attack. 



Example 2 CI(0) function. 

Let us consider a LFSR of length L = 40 with feedback polynomial: 



P{x) = l + x^ + x‘^ + x^ + x^ + x^ + + x^^ + + x^^ + + 



„24 



„25 



„26 



„27 



.,28 



„29 



„35 



„40 



Consider a BSC with noise probability of p = 0.49 (P^ = 0.65 and P[xt yf yt] = 
0.47) modelling a CI(0) Boolean function and the plaintext noise at the same 
time. Since — 1 = 3 • 5^ • 11 • 17 • 31 • 41 • 61681, by choosing d = 1, 048, 577 as 
decimation factor for the LFSR output sequence a, we obtain a simulated shorter 
LFSR of length L* = 20 with feedback polynomial: 

F'(x) = l-|-x-|-x-|-x-|-x-|-x-|-x-|-x -l-x -l-x -l-x -l-x -l-x 

With pnd = 0.1 and pfa = 0.1, we then need by Equationf^ N = 13050 bits 
of decimated sequence p = a[d\ that is to say N ■ d = 2^^ bits of ciphertext. 
Complete computation time (i.e. recovering the correct inital state, with k = 2) 
required about 15 minutes on PII 400 Mhz with less than 1 Mo of memory. 

Note that the 20,971,541st bit (e.g.), when seen as belonging to sequence x is 
described by the following (40 variables) equation (where Xi 0 < i < 39 denotes 
ith unknown bit of the L-bit initialization): 

^20,971,541 = X2 -I- X4 -I- X6 -I- X7 -I- X14 -|- X16 + X17 -|- X19 -|- X20 + X22 + X23 
-I- X25 + X26 + X28 + X29 + X30 -I- X33 -I- X36 -f X37 
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whilst, when seen as 21st bit ofx[d], we have the following (20 variables) equation 
(where j/i 0 < i < 19 denotes ith unknown bit of the L* -bit initialization) : 

^20, 971, 541 = yo + 2/1 + 2/2 + ys + 2/6 + y? + ys + yii + yi2 + yis + yi 5 + yis 



Examples CI(1) function. 

Consider now two LFSRs P\ of length L\ = 34 and P2 of length L2 = 39 
combined by a CI( 1 ) Boolean function. 



Pi{x) = l + x^ + x^ + x^ + x^ + x^^+ x^^ + + x^"> + x^« + x"*" + X 



„21 



X^^ + X^'^ + x"^^ + X^® + X^° + X^^ + X^^ + X^‘^ 



P2(x) = 1 + X + x^ + x'^ + x"‘ + X^ + X^'‘ + X^'" + X^*" + X^^ + X^^ + X^^ + X 

+ x^^ + x'^'^ + X^^ + X^® + X^® + X®^ + X®^ + X®® + X®”^ + X®® 



„24 



We take the same BSC model as Example but for a CI( 1 ) Boolean function. 
This implies we must try all the possible initial states of at least two LFSRs at 
the same time thus requiring with previous known attacks to consider a single 
target LFSR of length 73. The best decimation factor in this case is d = 131, 073 
which divides 2®^ — 1. So we obtain these two new simulated LFSRs: 



P( (x) = 1 + X + x^ + x^ + X® + x^ + X® + x^® + x^^ + x^^ + x^® + x^® + 



„17 



p; (x) = 1 + X + x" + X® + x'^ + x^4 + X®® + x^'^ + x^® + x"® + x"" + x"^ + X 



„17 



„24 



„28 



„30 



„32 



„34 



„39 



With pnd = 0.1 and pfa = 0.1, we then need by EquationU^ N = 13050 bits 
of decimated sequence p = cr[cf| that is to say N ■ d = 2®® bits of ciphertext. 
Complexity is in 0(2®®) with less than 1 Mo of memory. Partial experiment 
(k = 8) have been conducted on a PLL 4.OO Mhz to verify that final pfa was 
effectively close to zero. 

Note that rank of the system in L variables is 56, thus with 17 parameters. 
Then the additional exhaustive search is on 17 bits and is negligible compared to 
the first exhaustive search step. 



3.3 Decimation Attack Resistance Criterion 

By direct use of Proposition [I] we can define the following criterion for Decima- 
tion Attack resistance. 



Proposition 2 Let L S N. Any feedback polynomial of degree L will resist the 
decimation attack if and only if\/d <2^ — 1 such that d|(2^ — 1), the multiplicative 
order of 2 in Tjt* is equal to L where T* = a) 



We then obviously have 



Corollary 1 Lf T = 2^ — 1 is prime then any feedback polynomial of length L 
will resist the decimation attack. 
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In fact, finite field theory allows to precise im a simpler way this criterion. 

Theorem 1 (Subfield Criterion) \11] Let Fg he a finite field with q = ele- 
ments. Then every subfield o/Fg has order where m is a positive divisor of 
n. Conversely, if m is a positive divisor of n, then there is exactly one subfield 
of Fg with p'^ elements. 

Then we can easily state the resistance criterion as follows: 

Proposition 3 Any feedback polynomial of length L, such that L is prime, will 
resist the decimation attack. 

Proof, straightforward by direct application of Theorem [T] and Proposition [21 

Then, resistance is ensured as soon as the field F 21 . contains no subfield. It 
becomes then obvious that, when decimation attack is possible, there exists a 
value of L* at most equal to 

Note that this criterion is very new and must not be mistaken with that of 
relative primality of periods when considering the sum of outputs of LFSRs 
as described by the following theorem: 

Theorem 2 [711 p. 224 ] For each i = 1,2, . . . ,h, let ai be an ultimately periodic 
sequence in Fg with least period ri. If r\,r 2 , . . . , rt are pairwise relatively prime, 
then the least period of the sum cti + <T 2 + • • ■ + is equal to product r\r 2 ■ . .rh. 

When designing a stream cipher system, one must carefully check the degree of 
the chosen feedback polynomials (in connection with correlation order) to resist 
the Decimation Attack. The following table gives some values of L satisfying this 
new resistance criterion (’’prime” relates to Corollary [I] and ’’order” relates to 
Proposition |2J . A complete list of parameters d, T* and L* up to L = 256 has 
been computed and can be found in [B] . 



Comment 


L 


Prime 


5 - 7 - 13 - 17 - 19 - 31 - 61 - 89 - 107 - 127 


Order 


11 - 23 - 29 - 37 - 41 - 43 - 47 - 53 - 59 - 67 - 71 - 73 
83 - 97 - 101 - 109 - 113 - 131 - 139 - 149 - 151 - 157 - 163 
167 - 173 - 178 - 179 - 181 - 191 - 193 - 197 - 199 
211 - 223 - 227 - 229 - 233 - 239 - 241 - 251 



4 Comparison with Previous Known Attacks 

4.1 Canteaut and Trabbia Attack 

Canteaut and Trabbia (CT) in j3] recently improved fast correlation attack on 
stream ciphers. They considered parity-check equations of weight <5 = 4, 5 with 
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Gallager iterative decoding. However, the main drawback of this approach, de- 
spite its relative efficiency, remains the huge amount of required memory both 
for preprocessing (generation of the parity-check equations) and decoding steps, 
specially for long LFSR [2- This latter, though being very efficient best, still 
requires higher complexity than suitable for frequent key recovering. 

We now give here comparison of our attack (DA) with CT attack. Suppose 
that by applying condition of Proposition |T] a reduction of AL bits on exhaustive 
search is possible. Then our attack has complexity Co a = 0{N ■ 

(where N is given by Equation [T]) and a negligible amount of memory. Let us 
now compute the complexity gain from CT attack. In the complexity is given 
by the following formula: 



where 5 is the weight of the parity-check equation (3 < 5 < 5) and Cs- 2 {p) is 
the BSC capacity (with overall error probability ps -2 = 5(1 — (1 — 2p)^“^)) i.e. 
Cs- 2 {p) = 1 + P5-2log(p5-2) + (1 - P<5-2)log(l - ps- 2 )- The value « 1 if 
^ > 4 and ATa « 2. Finally as{p) = ^ log2[(5 - 
The complexity gain of our attack is then 



^DA 



Ks 

N-Cs-2{p) 



2°‘s(p) + j^+^L-L-1 



For relatively short LFSR, the complexity gain is limited (provided <5 is small) 
and the ciphertext sequence is longer for DA as illustated by Table [T] It is 



Table 1. Comparison on Example [2] 





Decimation Attack 


CT Attack (3 = 5) 


Ciphertext bits 


2-ii 


22U 


Complexity 


2^1 


2^5 


Memory (bytes) 


< 1024 


2^JS 



easy to see that the gain increases with S. Now using higher weight precisely 
constitutes the central point of CT attack. Moreover, the gain increases with 
the error probability p. These two facts reinforces the interest of our technique 
when dealing with longer LFSRs and higher error probability of real-life cases. 
Table[2]gives comparison for this case (”ppm” means preprocessing with memory 
and ”ppwm” means preprocessing without memory). Concerning the memory 

requirement, CT attack needs {S — 1) ^ Cs^ 2 (p ) ) -1-2“'’^^^^^^“''^ computer words 
of memory. This once again increases with S and p. 

Note that preprocessing step in CT attack must be done once and for all but 
for each different feedback polynomial of length L. On the contrary. Decimation 
Attack apply to all polynomials of degree L (for suitable L) and no preprocessing 
is required. 
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Table 2. Comparison on Example [ 3 ] 





DA 


CT (J = 5) 
(decoding step) 


CT (J = 5) 
(ppm step) 


CT (J = 5) 
(ppwm step) 


Ciphertext bits 


2™ 


2^79 


- 


- 


Complexity 




2^7 


2^b 


2^1 


Mem. (bytes) 


< 1024 


2^5 


2^7 


- 



4.2 Chepyzhov, Johansson and Smeets Attack 

In [ 2 |, V. Chepyzhov, T. Johansson and B. Smeets recently proposed a new 
powerful idea to improve fast correlation attacks (CJS attack). The key idea is 
to combine pairs of parity-check equations describing a [N, L] code on a BSC of 
error probability p, in order to obtain a code of lower dimension [ri2, K] {K < L) 
over another BSC of parameter p2- The price to pay is ri2 > iV and p2 = 
2 p(l — p) = i — 2 e^ > p where e = ^ — p. So a trade-off has to be found since 
lowering K increases ri2. In this part, only algorithm A 1 of will be considered 
since it has the best complexity and required sequence length than algorithm 
A 2 . 

Consider a register of length L and a BSC of parameter p— ^ For a given 
K the required length N of the observed sequence for algorithm A 1 to succeed 
is 

Then the decoding step complexity has order 0 ( 2 *^ • 712) where ri2 has mathe- 
matical expectation given by E{n2) = Complexity of the pre- 

computation is negligible. Storage requirement is at most n2{k + 21 og 2 iV) bits 
(generator matrix G2 of the new [n2,k] code). So by taking a small k, the 2 *^ 
factor is reduced but conversely it must be paid with the growth of 77,2, and thus 
of the observed sequence length N. 

Comparison with the Decimation attack is not easy. The great advantage of 
CJS is that it works even for register of prime length but conversely preprocessing 
is to be done for each different polynomial. In the other hand. Decimation attack 
seems to succeed more frequently for sequence length of quite the same order 
and requires no memory whilst memory requirements grows with 77,2 (that is to 
say when K decreases) in CJS attack. 

Thorough comparison need to be conducted and is at present on hand. DA 
attack as presented in this paper use only ciphertext and need to be considered in 
the known plaintext context to be compared to CJS attack. Moreover DA attack 
seems clearly less sensitive to BSC error probability p since it use a (optimal) 
Maximum Likelihood decoding. That implies that for higher p DA seems less 
complex and requires less ciphertext length, when considering same decoding 
error probability. 

All these assumptions need to be confirmed. Unfortunately, data given in [ 5 ] 
were only for success probability set to As a first attempt in comparison, let 



Decimation Attack of Stream Ciphers 



41 



us consider the example given in Section 6] for L = 40. Data are presented 
in Table for known plaintext attack in order to use the results given in [^. 
Note that in all cases, CJS attack find only K < L bits and need to recover 



Table 3. Comparison of DA and CJS attacks 





Decimation Attack 


CJS Attack 


BSC error probability 


0.45 


0.45 


Sequence length N 




2^3 


Success probability 


0.9 


i 


Complexity 




2^s 



the remaining ones whilst DA recovers L bits. Complexity and probability of 
success of Table El have thus to be to be modified. Lowering K will in fact 
deplaces complexity on the L — K bits recovering step and greatly increased 
failure probability ((particularly final probability of success should be far lower 
than i). 

To conclude, as far as comparison between the two attacks is possible, respec- 
tive complexity of the two attack seems to be close. Nevertheless, CJS attack is a 
very powerful, stimulating and challenging attack. Development of the potentiel 
of Decimation attack is at now under progress. 
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Abstract. A5/1 is the stream cipher used in most European countries 
in order to ensure privacy of conversations on GSM mobile phones. In this 
paper we describe an attack on this cipher with total work complexity 
239.91 A5/1 clockings, given 2^°'® known plaintext. This is the best 

known result with respect to the total work complexity. 



1 Introduction 

The GSM mobile phone system, which has more than 200 million users, uses 
encryption to protect the privacy of conversations. Two ciphers are used. A5/1 
is used in most European countries, and A5/2 is used in most other countries. 
Both ciphers, combine a few (3 or 4) Linear Feedback Shift Registers (LFSR), 
whose clocks are controlled by some of their bits. 

A5/1 and A5/2 were kept secret by the GSM companies for a long period of 
time. Only recently A5/1 and A5/2 were reverse-engineered and published by 
Briceno, Goldberg and Wagner at n Afterwards A5/2 was also cryptanalyzed 
by Wagner |5]. 

Before A5/1 was reverse-engineered several researchers, and in particular 
Anderson, identified and published PQ the general structure of A5/1. Golic [7] 
attacked this structure with parameters which are very close to the actual pa- 
rameters of A5/1 by solving a system of 2“^^ linear equations. This attack is likely 
to work on the real version of A5/1. The complexity of both attacks is equivalent 
to about 2'^'^ A5/1 clockings. 

A5/1 is a stream cipher with a 64-bit key. Encryption is also controlled by a 
22-bit frame number which is publicly known. In practice, A5/1 is always used 
with only 54 bits of key, with the remaining 10 bits set to zero. 

Biryukov and Shamir presented an improvement on the Golic Time-Memory 
tradeoff attack in [2]. The attack requires about 2 minutes of GSM conversation, 
and finds the key in a few seconds. However, their attack requires a large pre- 
processing phase equivalent to 2^^ A5/1 clockings with memory complexity of 

* This work was supported by the European Union fund 1ST- 1999- 12324 - NESSIE 
and by the Technion’s Chais’ Excellence Program. 
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four 73 GB hard-drives, or pre-processing of 2"^® A5/1 clockings with memory 
complexity of two 73 GB hard-drives. 

In this paper, we introduce a new technique for retrieving the internal state 
of the cipher. Once the internal state of the cipher is known it is easy to find 
the secret key by clocking the cipher backwards, using the technique presented 
in H. 

This paper is organized as follows: In Section 2 we describe the A5/I algo- 
rithm. In Section 3 we briefly describe the previous analysis of this cipher. In 
Section 4 we describe the basic idea behind our attack. In Section 5 we show 
several improvements to the attack which requires ® known output stream, 
and only steps of analysis (the time unit is one A5/1 clocking). The attack 
finds the full internal state of the registers, from which the original key can be 
derived easily, as presented in [2]. 

2 A Description of A5/1 

A5/1 combines 3 LFSRs as described in Figured] At each new step, 2 or 3 
LFSRs are clocked, according to a clocking mechanism which we will describe 
later. The output is the parity of the outputs of the 3 LFSRs. 




Fig. 1. The A5/1 Structure 



We denote the LFSRs as and R^. The lengths of and are 

19, 22 and 23 bits, respectively. The output of each LFSR is the last bit (i.e., 
bits 18, 21 and 22, respectively). The registers are updated according to their 
primitive polynomials, which are summarized in Table dl The clocking decision 
is based upon one bit of each register. The three bits are extracted (bit 8 of i?i, 
bit 10 of i ?2 and bit 10 of R 3 ) and their majority is calculated. The two or three 
registers whose bit agrees with the majority are clocked. 

We denote the bits ji, ■ ■ ■ ,ji of register Ri by Ri[ji, ■ ■ ■ ,ji]- 
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Register 

Number 


Length 
in bits 


Primitive 

Polynomial 


Clock-controling 
bit (LSB is 0) 


Bits that 
are XORed 


1 


19 


x^''^ + + x‘‘ + X + 1 


8 


18,17,16,13 


2 


22 


X^'^ +X+1 


10 


20,21 


3 


23 


+ x^^ + X^ + X + 1 


10 


22,21,20,7 



Table 1. Parameters of the A5/1 Registers 



The initialization of the registers loads the bits of secret key Key, followed by 
the bits of the frame number Frame and discarding 100 output bits, as follows: 

1. Set all LFSRs to 0 (i?i = i?2 = i?3 = 0). 

2. For i := 0 to 63 do 

(a) Ri[0] = Ri[0](BKey[j] 

(b) R2[0] = R2[0]®Key[i] 

(c) R3[0] = R3[0](BKey[i] 

(d) Clock all three registers (i.e., for j > 0 Ri[j] ^ Ri[j — 1], and is set 
to the result of the primitive polynomial on the previous value of Ri). 

3. For * := 0 to 21 do 

(a) i?i[0] = i?i[0] 0 Frame[i] 

(b) R 2 I 0 ] = R 2 I 0 ] (B Frame[i] 

(c) i?3[0] = R3I0] 0 Frame [i] 

(d) Clock all three registers. 

4. For i := 0 to 99, clock the cipher by its regular majority clocking mechanism, 

and discard the output. 

After the initialization, 228 bits of output stream are computed. 114 bits are 
used to encrypt data from the center to the mobile phone, and the other 114 
bits are used to encrypt data from the mobile phone to the center. 

3 Previous Work 

Several papers analyzing variants of A5/1 have been published |7I1I2| . One of 
them |Z] attacks an alleged version, which is very similar to the real A5/1. This 
attack takes on average workload and finds the internal state of the cipher. 
However in [2] the time unit is the time needed to solve a linear equations system, 
a unit which we do not use. We, on the other hand, use (like |2!) a workload unit 
of one A5/1 clocking. Colic [2] also presents a time-memory tradeoff, which was 
enhanced by Biryukov and Shamir in Pj after the first version of our paper was 
written. 

Colic’s first attack [3] is based on creating sets of linear equations and solving 
them. Although it seems that 64 linear equations are necessary. Colic observed 
that on average 63.32 equations are sufficient as only internal states are 

possible after clocking. This can be easily identified by additional equations 
which reject the 3/8 of the states that have no predecessors. 
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In the first step, the attacker guesses n clock controlling bits from each of the 
three registers. Since these bits are sufficient to know the clocking sequence of 
4n/3 bits on average, the attacker can receive 4n/3 linear equations (on average) 
about the registers contents. In addition, the first output bit is known to be 
the parity of the most significant bits of the three registers, hence the attacker 
obtains another equation. Therefore, the attacker now has 3n + 4n/3 + 1 linear 
equations. In the process of analysis we assumed the bits to be independent, thus, 
n cannot be bigger than the shortest distance between the clock controlling bit 
and the output bit. 

For n = 10 the attacker can get on average 3n + 4n/3 + 1 = 44.33 linear 
equations, thus he needs about 19 more equations. Golic observed that not all 
2^® options need to be considered. Instead, the attacker builds a tree with all 
the valid options for the possible values for the three input bits to the majority 
clock-control function. Since in 3/4 of the cases two new bits are considered and 
in the remaining 1 /4 of the cases, 3 bits are considered then each node contains 
3/4-4-|-l/4-8 = 5 options. However, when we walk along the path in the tree, 
we need to consider only the cases for which the output of the registers agrees 
with the output stream, which happens on average in 1/2 of the cases. Thus, 
we travel along the path on the tree which corresponds to the output stream. 
On this path each node has 2.5 valid options on average. Since the knowledge 
of 4/3 to bits is sufficient to receive the linear equations about the m bits out of 
each register (due to the clocking mechanism), the tree of clocking options needs 
to be considered only until a depth of 4/3 • 19/3 = 76/9 = 8.44. Since each level 
has a branching factor of 2.5, the amount of time needed to search the tree is 
2 58.44 

The total number of systems is 2^° • 2^^-^® = (there are 2^° possible 

guesses for the n = 10 bits of each register, and each guess requires 2^^’^® addi- 
tional equations) . As we search exhaustively, on average only half of the options 
are needed until we encounter the right value, thus the complexity of this attack 
is equivalent to solving sets of linear equations. 

In P] Biryukov, Shamir and Wagner presented a time-memory tradeoff at- 
tack. In their attack a special pattern is chosen, and the attacker preprocesses 
many samples which generate this specific long pattern. Once the pattern oc- 
curs in the real output stream, the attacker finds the possible internal states, 
according to the rest of the output stream and the pre-processed table. The at- 
tack requires 2^® to 2^® pre-processing time, and has many trade-offs between 
the length of the conversation (and analysis), the pre-processing phase and the 
memory needed for the attack. 

4 Our Basic Attack 

The main idea behind the attack is to wait until an event which leaks a large 
amount of information about the key occurs and then to exploit it. Assume 
that for 10 consecutive rounds the third register {R^) is not clocked. In such a 
case, we gain about 31 bits of information about the registers. Assume we know 
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i?3[10] (the clock controlling bit) and i?s[22] (the output bit), then we know for 
these 10 rounds that all the clock controlling bits in the other two registers are 
the complement of i?3[10] (thus receiving 20 bits, a bit from each register each 
round). We also know that the output bit of the stream XOR i?3[22] is equal 
to the output bits of R\ XOR i?2 (each round). There are at least 11 rounds 
for which the same output bit is used from i?3, thus we receive another 11 
linear equations about the registers. The equations are the parity of the couples 
Ri[t] © i?2[t + 3] for t = 8 , . . . , 18. 

Guessing 9 bits from i?i (i?i[18, 17, 16, 15, 14, 12, 11, 10,9]) and another one 
from i?2 (.^2(0]) uncovers all of i?i and i?2- Note that we do not need to guess 
i?i[13] as we know the parity bit i?i[18] © i?i[17] © i?i[16] © i?i[13], thus guess- 
ing i?i[18], i?i[17], i?i[16] gives i?i[13], in addition to the corresponding bits of 
i?2 (R2[21, 20, 19, 16]). i?2[ll] can be easily computed since i?i[8] © i?2[H] is 
known (from the output stream), and i?i[8] is known (since it was the first 
clock-controlling bit in register Ri). 

The complexity so far is the cost of guessing twelve bits (i?3[10], i?3[22], i?2[0], 
and 9 bits of i?i), and we recover all the bits of R\ and i?2- We can continue to 
clock the states until R3 moves, and gain another bit (i?3[21]). 

By further guessing the bits i?3 [0] , . . . , i?3 [9] we can continue clocking the 
third register and receive all the unknown bits of R 3 (total of 11 unknown bits). 
The workload of this stage is the cost of guessing 10 bits, multiplied by the cost 
of work needed to retrieve the unknown bits for each guess. Since each time 
the register i?3 advances we recover a new bit of i?3, we need to wait for 12 
clockings of i?3 in order to retrieve the full register, and since the probability 
that the register R3 advances is 3/4, it is expected that 16 clocks suffice to 
retrieve the whole register. Now we have a possibility for the full internal state 
and need to check for its correctness. The amortized cost for checking whether 
this is the right state is 2 clock cycles for each guess (all cases need to be clocked 
at least once, half of the cases should be clocked another time, 1/4 a third time, 
etc.). Thus, this stage requires 2^° • 2“^ • 2 = 2^® workload. 

The total expected running time of this attack is 2^^ • 2^^ = 2^"^, given the 
location where i?3 is static. This location is of course unknown, thus, we need 
to examine about 2^° possible starting locations (from many possible frames). 
Hence, the basic attack requires about 2"^^ running time and about 2^° bits of 
the output stream. 

Note that this complexity is similar to the complexity of earlier attacks (solv- 
ing 2^^ linear equations sets in [Zj can be done in about 2 '^'^ time). 



5 Trick or Treat? 

We now present several techniques for reducing both time complexity and mem- 
ory complexity required for the attack. 

The first technique is based on a technique presented in P|. It is known 
that the polynomials of the registers of A5/1 are primitive, and thus the states 
(except for the all-zero state) form one huge cycle. Selecting any non-zero state 
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and computing the next states iteratively would result in all (non-zero) states. 
Therefore, for each register we choose a non-zero state, and clock the register 
until we get the same state again. We store the states in a table in the order 
they are computed. We call this table the next-state table. In this table if entry 

1 corresponds to state x, then the state consecutive to x is stored in entry 
During the generation of this table, we also generate a table of pointers, in which 
for each state x we keep its location in the next-state table. 

Given the two tables (for each register), we can clock any state s by any 
number of clockings c in a fixed time unit. The method for doing so is to access 
entry s in the pointers table in order to find the location / of s in the next-state 
table, and then access entry / -|- c in the next-state table. The cost of the first 
access is equivalent to two (original) A5/1 clockings (as we access two tables), 
but once the location in the next-state table is known, only one table needs 
to be accessed. Thus, when a state should be clocked again (or iteratively) the 
additional cost is equivalent to one A5/1 clocking. 

The memory size needed for the tables is about 71.5 MB. For each state we 
need to store the next state and pointer to the next-state table. Since each of 
these fields is in the size of the register, the total size of an entry is bounded by 

2 • 23 = 46 bits (or 6 bytes), and there are 2^^ states in i? 3 , 2^^ in i ?2 and 2^® in 
R\. Thus the tables’ size is about (2^^ -I- 2^^ -|- 2^®) • 2 • 23 = bits which are 
71.5 MB. 

For any possible value of the 5 most significant bits in each register and the 
next 5 clock-controlling bits, we calculate as many bits of the future output 
stream as possible. We build another table indexed by the first 5 bits of the 
output stream, and the 20 bits of i?i and i ?2 (5 MSB and 5 clock-controlling 
from each), which contains in each entry the possible values of the 10 bits from 
i ?3 which generate this 5-bit output stream. Note that for some fraction of the 
cases 6 or 7 bits of output stream can be generated (given the 10 bits of each 
register). 8-bit or longer output stream cannot be generated due to lack of clock 
controlling bits, as each clocking requires at least two new clock controlling bits 
out of a total of 15. Thus, given the output stream we can reduce the number 
of possible values of the bits of i ?3 which generate the output stream. In order 
to do so we separate the cases according to the length of the output stream, 
and if there is additional bits of the output stream, according to their value. 
Each entry also contains information on how many clocks each register needs to 
be clocked after the given known output stream has been generated by the 30 
bits (10 from each register as above) of the state. The computation of this table 
requires about 2^°-6 « 2^^ ® A5/1 clockings, which can be computed in advance. 

The length of the predicted output stream is a function of the 15 clock 
controlling bits. In 15968 cases the output stream is in the length of 5 bits, in 
14280 cases the stream’s length is 6 bits, and in the remaining 2520 cases a 7-bit 
output stream is generated. In the case in which we know 20 bits of R\ and i ?2 
and the output stream, we need to check all the corresponding cases with a 5-bit 
output stream, half of the cases with a 6-bit output stream and 1/4 of the cases 
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with a 7-bit output stream. Thus, we need to consider on average « 23.2 
options for the 10 bits of R 3 . 

Note that the first time we access the table three bits are known (i?3[22], 
i?3[21], i?3[10]), so we have about 2^-^^ « 2.9 options for the rest of the 10 — 3 = 7 
bits. In order to use those known bits we need to sort the values in the table 
according to those 3 bits, and then retrieve the valid 2^-®^ values directly in 
one table look-up. Sorting the table in that manner either increases the time of 
analysis or table size by a small factor. 

For each remaining case we clock the registers as many times as needed ac- 
cording to the first table access (since the table contains the number of clockings 
for each register). Then, we approach the table again and get about options 
for 10 unknown bits of R 3 . 

We now lack 3 bits from R^. Approaching the original table again would cost 
us too much (we would get options and need to check them all), thus we use 
a smaller table containing only 6 bits of each register. This table should suffice 
to give us the last 3 unknown bits. This table is expected to give us for each case 
0.88 possible values on average. In this stage, we discard the wrong options by 
the regular method of clocking forward (while we can use the table to do this 
stage more efficiently, however the improvement of the complexity is negligible) . 

The total time complexity is calculated as follows: There are 2^° possible 
starting locations, and each has 2^^ possible guesses for the bits of the first stage 
(9 from Ri, i?2[0] and i?3[10, 22]). For each possibility we get about 2^-®^ possible 
values for some of the unknown bits of i? 3 , and the clocking costs 2 clocking units 
(as this is the first approach to the next-state table). Then, for each possible value 
for the first 7 unknown bits from R 3 , we get about 2^-^^ possible values for the 
next 10 unknown bits of R 3 . Each possibility needs to be clocked once, followed 
by accessing the smaller table. About 0.88 cases remain on average, and we need 
to clock each twice (on average) to check the guess. Thus, the time complexity 
of the attack is equivalent to 2^0 • 2^2 . • 2 • 24-53 . (l + l + 2 • 0.88) = 

A5/1 clockings. 

We can improve this result by using more bits in the table. We build the table 
based on 12 bits from Ri and R 2 (6 most significant bits, and 6 clock-controlling 
bits), and 10 bits from R 3 . In that case 23328 options have a 5-bit output stream, 
59808 a 6-bit output stream, 41496 a 7-bit output stream and 6440 a 8-bit output 
stream. This way given the 24 bits of register Ri and R 2 and the 5 bits of the 
output-stream, there would be only 2 '^ options on average for the 10 bits of R 3 . 
The complexity in this case is 2^° • 2^^ • 2^ • 2 • 24 • (1 -|- 1 -|- 2 • 0.88) = 

The preprocessing time is about 234 ^ g = 23”^ A5/1 clockings, and the table size 
is 234 . 2 = 235 bytes, i.e., 32 GB of data. This needs to be multiplied by the 
overhead of keeping the table indexed both by 20 known bits (10 of Ri and 10 
of R 2 ) and by 23 bits (10 of Ri, 10 of R 2 and 3 of R 3 ), so in one memory access 
we get only the relevant possibilities, which is a factor 2, i.e., 64 GB. 

We can, however, reduce the size of the tables by a factor of two, using the 
observation that if we switch the known bits of i?i with those from i? 2 , the result 
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for i ?3 remains the same. This way we can reduce the size by almost a factor 
of 2. 



6 Summary 

We have shown a technique to cryptanalyze the A5/1 cipher. The attack is feasi- 
ble with current technology, which suggests that this scheme should be replaced. 
The attack requires 2^® ®^ A5/1 clockings. The attack requires 2®® • 228/(228 — 
63) = 2^®-® bits of data, which are equivalent to about 2.36 minutes of conver- 
sation. 

The retrieval of the key given the internal state can be easily done using the 
algorithm presented in [7|. 

We summarize our results and the previously known results in Table 



Attack 


Pre- Complexity Time 

computation of Analysis Unit 


Data Comp- Memory 
lexity (bits) Complexity 
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Our Results 
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Table 2. Attacks on A5/1 and their complexities 
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Abstract. Security analysis of block ciphers against linear cryptanalysis 
has virtually always been based on the bias estimates obtained by the 
Piling-Up Lemma (PUL) method. Despite its common use, and despite 
the fact that the independence assumption of the PUL is known not to 
hold in practice, accuracy of the PUL method has not been analyzed to 
date. In this study, we start with an experimental analysis of the PUL 
method. The results on RC5 show that the estimates by the PUL method 
can be quite inaccurate for some non-Feistel ciphers. On the other hand, 
the tests with SP-structured Feistel ciphers consistently show a much 
higher degree of accuracy. 

In the second part, we analyze several theories for an alternative method 
for bias estimation, including correlation matrices, linear hulls, and sta- 
tistical sampling. We show a practical application of the theory of cor- 
relation matrices, where better estimates than the PUL method are ob- 
tained. We point out certain problems in some current applications of 
linear hulls. We show that the sample size required for a reliable statis- 
tical estimator is an impractically large amount for most practical cases. 



1 Introduction 

Estimating the bias of a given linear approximation is one of the most important 
problems in linear cryptanalysis: the success rate of a linear attack is directly 
related to the bias of the approximation it uses, therefore, security analysis of 
block ciphers against linear cryptanalysis is exclusively based on the estimation 
of the bias of their linear approximations. 

In practice, estimation of the bias is almost exclusively based on the Piling- 
Up Lemma (PUL) j1 1) . which is a very practical tool for bias estimation on 
iterated block ciphers. To estimate the bias of a multi-round approximation, the 
round approximations are assumed independent and the bias of the combined 
approximation is calculated by the PUL. We will refer to this application of the 
PUL as the PUL method. 

In the first part of this study, we analyze the bias estimates obtained by the 
PUL method. Although the PUL method has been widely used, and although it 
is known that this method’s assumption of independent round approximations 
is virtually never true, the accuracy of the estimates obtained by this method 
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has almost never been the subject of a study0 Our analysis concentrates on 
two cases: DES-like SP-structured Feistel ciphers and RC5. The Feistel ciphers 
represent the class of ciphers that the PUL method was originally applied on. 
RC5 represents a cipher that has a totally different structure (which is based 
on a mixture of arithmetic operations and data-dependent rotations, instead of 
the traditional substitution and permutation structures) . In the study of Feistel 
ciphers, we analyze the accuracy of the estimated values with respect to various 
factors, including the number of active s-boxes in a round, presence/ absence of 
a bit expansion function, etc. 

The analysis results show that the PUL method gives quite accurate esti- 
mates with SP-structured Feistel ciphers, especially for approximations with at 
most a single active s-box at each round, as long as the estimated values are 
significantly higher than 2“ 5 . With RC5, the estimates turn out to have a much 
lesser accuracy. 

In the second part of this study, we look for an alternative estimation method 
which would give more accurate estimates than the PUL method in general 
(e.g., for non-Feistel ciphers, or for large number of rounds where the PUL 
method gives too small values). For this purpose, we analyze the theories of 
correlation matrices, linear hulls, and statistical sampling. We give an example 
application of correlation matrices for bias estimation, which gives consistently 
better estimates than the PUL method on RC5. We review the theory of linear 
hulls, which has also been used as an alternative technique for bias estimation. 
We point out certain problems with some current applications of linear hulls 
where the application has no basis in theory. Finally, we look at the prospects of 
estimating the bias by statistical techniques over a randomly generated sample 
of plaintext/ciphertext blocks. It turns out that the statistical techniques do not 
provide any practical solutions for bias estimation, especially when the inverse 
square of the bias is an impractically large amount for a sample size. 

Notation: Throughout the paper, we use n to denote the block size and r 
to denote the number of rounds in an iterated block cipher. Ki denotes the ith 
round key, Li and Ri denote the left and right halves of the round output, p is 
used for the probability of an approximation, where \p — 1/2| is the bias. Bits 
in a block are numbered from right to left, beginning with 0. The operator 
denotes the bitwise dot product. 



2 Experiments with RC5 

During some linear cryptanalysis experiments with small block sizes of RC5, we 
noticed significant differences between the actual bias of a linear approximation 
and the values that were estimated by the PUL method. We summarize these 
findings in this section. 

The RC5 encryption function is: 



^ One exception in this regard is |2], where the accuracy of PUL estimates was studied 
in the specific context of combining two neighbor s-box approximations in DES. 
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L\ — Lq + Kq 
R\ — Rq + Ki 

for i = 2 to 2r + 1 do 

Li — Ri— 1 

Ri — {{Li- 1 0 Ri-i) Ri-i) + Ki 
The best currently-known linear approximation of RC5 |S] is 

i?o[0] © i 2 r[ 0 ] = i^l[0] © K3[0] © • • • © i^ 2 r-l[ 0 ], (1) 

The probability of this approximation is estimated as l/2 + l/2w'^~^ by the PUL 
method where w is the word size in bits (in RC5, half of a block is called a word). 

We computed the bias of Approximation ([T]) by exhaustively going over all 
plaintext blocks for various values of w and r. The test results are summarized 
in Table [TJ The results show quite significant differences between the actual and 
the estimated values of the bias. Another remarkable point is that increasing the 
number of rounds does not affect the bias after a certain point, and the bias does 
not get much smaller than 2“““^. We further discuss these results in Section |4] 
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Table 1. Average actual bias of Approximation ([TJ and the bias estimated by the 
PUL for various values of w and r, with 500 randomly chosen keys for each w and r. 
The results show a significant difference between the actual bias values and the PUL 
estimates. The difference increases sharply with increasing number of rounds. 



3 Experiments with Feistel Ciphers 

Following the findings on RC5 described in Section |2] we performed similar tests 
with Feistel ciphers, which is the type of cipher the PUL method was originally 
used for El . In this section, we describe these tests and summarize their results. 

3.1 Design 

The ciphers used in these experiments are Feistel ciphers with 32-bit block sizes. 
The encryption function of a Feistel cipher is of the following form: 
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for i = 1 to r do 

Li — Ri— 1 

Ri = Li-i 0 Ki) 

The F function we used in the experiments has a structure similar to that in 
DES. It has a sequence of key addition, substitution, and permutation stages. In 
the key addition stage, Ri- \ is XORed by the round key iCi, with a possible bit 
expansion before the XOR. The Feistel ciphers used in the experiments include 
both those with an expansion function E and those without one. In ciphers with 
a bit expansion, the expansion function E is the equivalent of the expansion 
function in DES, reduced to 16 x 24 bits. 

The substitution stage is also similar to that in DES, with four parallel 
distinct s-boxes. The s-boxes are either 4 x 4 or 6 x 4, depending on the presence 
or absence of the bit expansion. The 4x4 s-boxes are chosen from the s-boxes 
of Serpent [3, and the 6x4 s-boxes are chosen from the s-boxes of DES [^. 

The permutation stage uses a 16 x 16 permutation function P to mix the 
output bits from the s-boxes. In the Serpent s-boxes, unlike the DES s-boxes [^, 
there is no distinction among the role of input bits. So, with the Serpent s- 
boxes we use the following simple permutation P which guarantees that each 
output bit from an s-box affects a different s-box in the next round: P = 
(15 11 7 3 14 10 6 2 13 9 5 1 12 8 4 0)|3 With the DES s-boxes, we use the follow- 
ing permutation which guarantees that the output of each s-box has an effect 
on two outer bits, two inner non-middle bits, and two middle bits (of different 
s-boxes) in the next round: P = (12 10 3 6 14 11 4 2 15 9 7 1 13 8 5 0). 

Notation: Each Feistel cipher used in the tests will be identified by the 
presence/ absence of the expansion in key addition, and by the numbers of the 
s-boxes used (ordered from left to right) . For the numbering of the s-boxes, the 
numbering in their original ciphers will be used. E.g. FCat£; 0214 will denote the 
Feistel cipher with no bit expansion and with the Serpent s-boxes 5 'q, S 2 , Si, 
S4. FC£;8735 will denote the Feistel cipher with the bit expansion E and with 
the DES s-boxes Ss, Sr, S3, S3. 

We start our experiments with the FC ne ciphers which are the simpler case 
since there is no issue of duplicate bits. We first look at the approximations with 
at most a single active s-box at each round. Then we go to the approximations 
with multiple s-boxes in the same round. 



3.2 Approximations with Single Active S-Box 

First, we look at how the PUL method performs on the approximations with 
at most a single active s-box at every round, which is the most basic type of 
linear approximations of an SP-structured Feistel cipher. We denote the round 
approximations in terms of the s-box approximations, as in mm- 

We consider the 4-round iterative approximations of the form ABC-, which 
can be combined by itself as ABC-CBA-ABC-..., where A, B and C are some 

The numbers show the new location of the bits after permutation. 
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non-trivial approximations of the F function, and denotes the nil approxi- 
mation. This is the form of the approximations which gave the highest bias on 
DEs mm . and which also gives the highest bias on our FCat_e ciphers. Here 
we present the results for three of our test cases. The results are summarized in 
Tabled The bias is denoted by b. 

Case 1.1: Cipher: FCat£;1745, A: 4-x = ll-5'i(x), & = 1/4, B: 8-x = 8 -St{x), 
b = 1/8, C: 4-x = 15-5'i(a:), b = 1/4. 

Case 1.2: Cipher: FCtv£;6530, A: 2-x = 13-S'5(a:), & = 1/4, B: 4-x = 4-53(a:), 
&=l/8, C: 2-x= 15-S'5(x), 6= 1/4. 

Case 1.3: Cipher: FCat_e0214, A: 4-x = 8-Si{x), b = 1/8, B: 2-x = 2-S2{x), 
6=1/8, C: 4-a:= 12-5'i(a;), 6= 1/8. 
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Table 2. Test results for single-sbox approximations. PUL estimates are quite accu- 
rate, as long as they are above Like the results on RC5, the bias does not go 

much below 



3.3 Approximations with Multiple S-Box 

In this section, we look at how having multiple active s-boxes in the same round 
affects the accuracy of PUL estimation. We focus our experiments on approxi- 
mations with two s-boxes, because in our miniature ciphers the bias with three 
or four active s-boxes drop too fast to draw any useful conclusions. 

We work with the 3-round iterative approximations of the form AB-, which 
can be combined by itself as AB-BA-AB-..., where A and B are approximations 
of the F function with two active s-boxes. Three such approximations are given 
below. The results are summarized in Table [S] 

Case 2.1: Cipher: FCat£;5614, A: (8-x = 8-Si(x), b = 1/4) AND (3-a: = 8-S5(x), 
b = 1/4) B: (9-x = 9-5'i(a:), 6 = 1/4) AND (9-x = 9-S4(x), b = 1/4) 

Case 2.2: Cipher: FCw_e4250, A: (10-x = 10-5'o(x), 6 = 1/4) AND (10-x = 
10-S'5(a:), 6 = 1/4) B: (8-x = 3-5'4(x), 6 = 1/4) AND (3-x = 3-5'5(x), 6=1/4) 
Case 2.3: Cipher: FCat£;5014, A: (8-x = 3-S'4(x), 6 = 1^) AND (3-a: = 3-S'5(a;), 
6 = 1/4) B: (9 -a: = 9-Si(x), b = 1/4) AND (9 -a: = 9-Si(x), b = 1/4) 
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(b) Case 2.2 
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(c) Case 2.3 



Table 3. Test results for approximations with two active s-boxes in a round. PUL esti- 
mates are somewhat less accurate than those in Table[2]for single-sbox approximations, 
but still better than those on RC5. 



3.4 Approximations with Expansion 

Here we look at the effect of having an expansion function at the key addition 
stage. When there is an expansion at the key addition stage like the E function 
in DES and our FCb ciphers, an approximation of an s-box not only affects the 
input to that active s-box, but also affects the two shared input bits with the 
neighbor s-boxes. Therefore, the output of the neighbor s-boxes will also be more 
or less affected by an s-box approximation. 

We tested the accuracy of the PUL estimates with certain approximations of 
the FC^; ciphers. The tests are focused on approximations with a single active 
s-box at every round, because the bias of approximations with multiple active s- 
boxes drops too fast in FC^s. The approximations used in the tests are iterative 
approximations of the form ABC-CBA-ABC-... The tested approximations are 
listed below, and the results are summarized in Table 2] 

Case 3.1: Cipher: FCb5216, A: 16-a; = 15-S'5(x), b = 20/64, B: 8-x = 8-S'i(a:), 
& = 4/64, C: 16-a: = 14-5'5(cc), 6= 10/64. 

Case 3.2: Cipher: FCis8735, A: 16-a: = b = 8/64, B: 4-x = 2-5'8(a:), 

& = 2/64, C: 16-a: = 15-5'5(a:), 6 = 20/64. 



3.5 Other Approximations 

The ciphers and approximations considered in these tests are by no means ex- 
haustive, and in fact there are many different Feistel ciphers and approximations 
possible. The purpose of the tests is not to exhaustively prove a result about 
the bias of Feistel ciphers, but to obtain a general view of the accuracy of PUL 
estimation on Feistel ciphers. As we will discuss in Section 21 these results indeed 
give a general idea on the subject. 
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(a) Case 3.1 (b) Case 3.2 

Table 4. Test results for single-sbox approximations with the expansion E. PUL 
estimates are slightly less accurate than those without E, given in Table [2] 



4 Discussion on the Results 

The test results on Feistel ciphers show that the PUL method is quite effective 
for bias estimation with SP-structured Feistel ciphers, as long as the estimated 
values are significantly higher than 2“^. The results are best when there is at 
most a single active s-box in each round approximation, and when there is no bit 
expansion. When there are more affected s-boxes in a round approximation, the 
number of bits affected by the approximation increases, and so does the effect 
it has on the following round approximations (i.e. dependence among round 
approximations) . 

The test results on RC5 show that accuracy of the PUL estimates may not 
be so good with ciphers that are not SP-structured Feistel ciphers. In the test 
results with RC5, there is a considerable difference between the estimated and 
actual values even for smaller number of rounds. With larger number of rounds, 
the bias may be significantly higher than 2“ 2 ' even after the estimated values 
become lower than 2“?. We can say, looking at the test results, that larger 
differences should be expected in practice with larger block sizes (i.e. with 64- 
and 128-bit blocks). 

It is not easy to explain the difference in the accuracy of the estimates with 
RC5 and with Feistel ciphers: The source of inaccuracy of a PUL estimate is 
the dependence between round approximations, which is a factor that has to be 
neglected by the PUL method by its very definition. Both the RC5 round ap- 
proximations and the single-sbox FCat_e round approximations affect (i.e., give 
information on) 4 out of 16 bits of Ri-\. Moreover, in the FCatb approximations, 
there are three non-trivial round approximations in every four rounds, whereas 
in the RC5 approximation there are only two non-trivial approximations in four 
(half)rounds. So, it would be natural to expect that there would be more depen- 
dence and interaction among the FC n e round approximations, which turns out 
not to be the case. For now, we accept the accuracy of the PUL estimates on 
Feistel ciphers as an experimental result and do not go into an in-depth analysis 
of the factors underlying it. 
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Similarly, we cannot give a simple explanation for why the actual and the 
estimated bias values for the Feistel ciphers go so closely until they reach the 2~ ^ 
threshold, and become so divergent after that point. It is not possible to simply 
explain this with the accumulation of the dependence affect with more rounds, 
since a comparison of the test results with low-bias and high-bias approximations 
suggests that the point where the actual and the estimated values diverge is not 
determined by the number of rounds, but is mostly determined by the proximity 
to the 2“ ^ threshold. 

4.1 Stabilization of the Bias 

Here we give two theorems related to the stabilization of the bias around 
Although the theorems do not explain how the sharp change in accuracy of PUL 
estimates at 2~ f is related to the dependence between round approximations, 
they provide some important facts about the stabilization of the bias. The first 
theorem gives a lower bound for the bias of the best approximation. The second 
theorem suggests that when the bias of the best approximation approaches the 
theoretical lower bound, the bias of almost all linear approximations of the cipher 
should be around 2~'^~^. The first theorem follows from Theorem 4 in [3] with 
p = q = n (see Appendix . The second theorem is observed in the proof of 
that theorem. Fj~ is an n-bit block cipher with key K = k] pafi denotes the 
probability of the approximation a-X (B b-Fk{X) = o[^ 

Theorem 1. 3 a,b G {0, 1}", such that \pa,b — 1/2| > 2“^. 

Theorem 2. Ea.befo,!}- \P<^,b ~ l/2p = 2”“^ 

5 Alternative Methods for Bias Estimation 

In this section, we analyze several theories for an alternative method for bias 
estimation, including correlation matrices, linear hulls and statistical sampling. 

5.1 Correlation Matrices 

For a function / : {0, 1}™ ^ {0, 1}", the correlation matrix is a 2" x 2™ 
matrix whose (&, a)th entry is the correlation coefficient 2Px{a-X = b- 
f{X)) — 1 [7]. The relationship between the correlation coefficients and the bias 
is straightforward: If F^ is a block cipher with key K = k, bias of a-X(Bb-Fk{X) = 
d-K (for any d) equals ’|/2- For / : {0, 1}^ ^ {0, 1}™, g : {0, 1}™ ^ {0, 1}”, 
we have = <7(9) X So, if Fk is an iterative cipher and fi is the ith 

round with its respective round key, we have = 111=1 Then 

equals I]ai.a 2 .....a,_i (lli=i ci{ai_i) where oq = a, Or = b, and each HLi 

^ The bias does not depend on the key mask here, because the key is a fixed parameter 
(which is also the case in a linear attack). 
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is known as the correlation contribution coefficient (CCC) of the linear trail 

O-Q — > di — > . . . — > dr- 

So, the theory of correlation matrices tells that the bias of a linear approx- 
imation is equal to the sum of the PUL biases (without absolute value) of all 
linear trails that lead to the given approximation. Hence, correlation matrices 
provide a generalization of the PUL method: Instead of using the PUL bias of 
a single linear trail, the bias can be estimated by summing up the PUL bias of 
as many linear trails as possible which lead to the given approximation. We will 
refer to this generalization of the PUL method as the CM method. 



An Example Application: As an example, we apply the CM method to our 
RC5 approximation CD). All effective RC5 approximations obtained so far [IHIDIltild] 
are based on round approximations with a single active input and output bit . To 
be expandable over multiple rounds, the 1-bit round approximations should be of 
the form i?j [wi] 0 i [m'] = Si[mi](Bc where mi, m' <lgw and c is a constant. 
These approximations are analyzed in detail by Borst et al. |3]. Probability of 
the 1-round approximation 



R^[mi] 0 = Si[mi] 0 (m^ - m')[m'], 

where {mi — denotes the m'th bit of {mi — m[) mod w, is equal to 

w ~ 2 ^)’ ^^6re s denotes Si mod 2”*^ Hence, the correlation coefficient 
of Ri[mi] = is equal to (-1)“^^ (l - ), where 6 = Si[mi] 0 {mi - 

m'i)[m'i\. The 1-bit trails that lead to Approximation ([I]) are those which satisfy. 



mi = 0 , 

mi = m '^2 < for i = 3, 5, . . . , 2r — 3, 

m2r-l = 0. 



We computed the bias by adding up the correlation contribution coefficients of 
all 1-bit linear trails of this form. The results are given in Figure [T] and are 
compared to the actual bias and the PUL estimate values. 
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Fig. 1. Comparison of the CM, PUL and actual bias values for w = 16 over the 
key sample used in Table [Tj The CM estimates are consistently better than the PUL 
estimates. But their accuracy too drops exponentially with the number of rounds. 



We would like to note that this example application on RC5 is intended to 
illustrate the practical usage of the theory of correlation matrices; it does not 
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show the limits of the theory. In fact, it is possible to obtain better estimates 
than those given in Figure[T]by including multiple-bit trails in bias estimation as 
well as the single-bit trails. But eventually the accuracy of the estimates should 
be expected to drop with the increasing number of rounds, since the number of 
trails that can be considered in bias estimation can be no more than a certain 
tractable number; but the number of all trails that contribute to the bias of an 
approximation increases exponentially with the number of rounds. 



5.2 Linear Hulls and Correlation Matrices 

Like correlation matrices, linear hulls [T^ can also be used to combine the bias 
of linear trails. But unlike correlation matrices, this kind of application of linear 
hulls is proven specifically for DES-like cipher^ In fact, in the Fundamental The- 
orem of Linear Hulls (see Appendix [HJ, the summation for the average squared 
bias is over different key masks, not over linear trails. But since in a DES-like 
cipher there is a one-to-one correspondence between a linear trail and a key 
mask, the summation can be transformed into a summation over linear trails 
(see Theorem 2 in [TT]lf^ However, this argument is not equally applicable to all 
ciphers. For example, for the addition operation X = Y + K , the input and out- 
put masks for the round are not uniquely determined by the key mask; i.e., for 
a given Ci > 1, there are many values of and bi such that the approximation 
ai-X = bi'Y + Ci'K has a non-zero bias. So, for example in RC5, there may be 
many linear trails corresponding to the same key mask c. 

In short, we would like to point out that the application of linear hulls to 
combine bias from linear trails has been proven specifically for DES-like ciphers, 
and it should not be used with arbitrary ciphers unless a proof is given for that 
application. In this respect, certain bias studies with linear hulls (e.g. m) have 
no theoretical basis H 

Another confusion with the application of linear hulls is that, linear hulls 
are often taken as the exact analog of differentials in differential cryptanalysis; 
i.e., it is assumed that \bias\ = J^LTiab) \PUL bias\ where J^LTiab) denotes 
the summation over all linear trails with plaintext mask a, ciphertext mask b. 
Obviously, this equation has no basis in the theory of linear hulls Q A similar 
but correct equation is the one given by correlation matrices where bias is taken 
without absolute value; bias = J^LTia b)(PPP bias). So, even though it is wrong 
to use J^LTia b) \PU L bias\ for bias estimation, it can be used as an upper- 
bound for bias in analyzing the security of a cipher against linear cryptanalysis. 

For a formal definition of DES-like ciphers for linear hulls, see [1 411 .'IJ . 

® This theorem on combining the squared bias of linear trails in a DES-like cipher 
is recently given an alternative proof by Nyberg nsi, which is based on correlation 
matrices rather than linear hulls. 

® A mid-solution for these applications can be possible if each trail used in bias esti- 
mation can be shown to match a different key mask. 

^ Simply note that every equation in linear hulls is in terms of squared bias rather 
than the bias itself. 
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However, the correct reference for this kind of application should be correlation 
matrices rather than linear hulls. 



5.3 Estimation by Sampling 

To estimate a parameter of a population where the population is too large to 
count every member, a random sample from the population can be used to 
estimate the desired parameter. For a typical block cipher size (e.g. 64 or 128 
bits), there are too many plaintext /ciphertext blocks to calculate the actual 
bias by going over every block; so, a random sample of blocks can be used to 
estimate the bias. Here we look at a number of alternative statistical estimators 
for estimating the bias over a random sample of plaintext blocks. Throughout 
this section, N is the sample size, T is the number of ciphertexts in the sample 
satisfying the approximation. E\] denotes the expected value, Har[.] denotes 
the variance, 9 denotes the bias \p — 1/2|. 9 is used for estimators for 9. MSE 
denotes the mean squared error, E[{9 — 9^]. 

The UMVUE: One of the most important point estimators in statistics is the 
uniform minimum variance unbiased estimator (UMVUE), which is the (unique) 
unbiased estimator that has the minimum variance among all unbiased esti- 
mators, under all values of the parameter to be estimated (TOj. Regarding the 
UMVUE, we prove the following negative result: 

Theorem 3. No unbiased estimator exists for |p — 1/2| over a random plaintext 
sample. 

Proof. T is binomially distributed. Assume 9n(T) is an unbiased estimatoiH: 

E[9m] = Y. - P)^-^ =\P- 1/2|, (2) 

for all 0 < p < 1. Now, define p = p/{l — p) so that p = p/{l p) and 
I — p = 1/(1 + p). For 1/2 < < 1, Equation (gj becomes 



N 



T=0 



^ MT) ( / = (1 + pp-\p - l)/2 = ^( 



N 



T=0 



N -1 
T-1 



N-l 

T 



)/2p^, 



1 < p < oo. A comparison of the coefficients of on the left and right sides leads 
to 9m{T) = T/N — 1/2. Similarly, for p < 1/2 we obtain 9pf(T) = 1/2 — T/N. 
Obviously, 9n cannot satisfy both of these equations. □ 



Corollary 1. The UMVUE does not exists for \p — 1/2|. 



We can denote the estimator as a function of T since T is a sufficient statistics |10j 
for Ip — 1/2|. 
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The Sample Bias: It may seem like a good idea to use the sample bias \T /N — 
1 /2 1 as an estimator for the actual bias |p — 1 /2 1 . In this section we show that the 
sample bias cannot be used to estimate the bias when the sample size is much 
smaller than \p — 1/2 1“^. 

T /N— 1 /2 approximately follows a normal distribution with mean /i = p— 1 /2 
and variance = p{l — p)/N k, 1/4,N . For 9 — \T /N — 1/2|, it can be shown 

E[d] = |Ai|(l - 2<?(-|^|/cr)) + 2u(j){\p\/a) 

Var[9] = p^ + a^-E[9f 
MSE{9) = E[{§ - 9)^] = Var[9] + {E[9] - \p\)^ 

= cr^ + 4\p\^<P{-\p\/a) - 4pa(j){\p\/a) 

where and denote respectively the probability density function and the 
cumulative distribution function for the standard normal distribution. As it can 
be seen from Figure |2] to have the standard error y/MSE at least comparable to 
|p — 1/2|, a sample size comparable to |p — 1/2|“^ will be needed. Therefore, when 
|p — 1/2|“^ is an intractably large number (which should be the case for a secure 
cipher), it will not be possible to obtain a reliable estimator from |T/A^ — 1/2| 
with any practical sample sizeH 



^MSE(B) 

e 




s ^ ' ~To 



Fig. 2. Standard error rate vs. Ipl/n, for 6 = |T/A — 1/2|. It converges to l/{\p\/a) 
in both directions. Sample size for a desired error rate y/MSE jO < e can be computed 
from |p|/cr — \p — l/2\y/ AN > 1/e, hence N > -^\p — 1/2|~^. 



If a sample size much smaller than \p— 1/2|“^ is used, then E\9] « , 

independent of |p — 1/2 1. As an example. Table |5| gives the results of a computa- 
tion of the sample bias of the RC5 approximation JT]) for w = 2>2 with N = 10^ 
plaintexts. For this sample size, we have l/vz/dlF = 2“^^®®, which explains 
the stabilization of the bias around 2~^^. 

® This sample size requirement should not be confused with the similar plaintext 
requirement for an attack. 
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Table 5. Average sample bias of the RC5 approximation dTJ for w — 32 with 10^ 
plaintexts, on 500 randomly chosen keys for each r. The results show an alarmingly 
high bias for a 64-bit block cipher. 



The MLE: Another important point estimator in statistics is the maximum 
likelihood estimator (MLE). The MLE for \p — 1/2| would be the value 9* that 
maximizes the likelihood function 

(( 1(1/2)^, if 0 = 0 (( 

Unfortunately, there is no easy way to compute 9* . Nevertheless, we can obtain 
a bound on the reliability of the MLE by assuming availability of some extra 
information, such as whether or not p > 1/2. If we know p > 1/2, then 

A^/0, ifT<iV/2 

(T/Af— 1/2, otherwise 

and vice versa for p < 1/2; which is not any more reliable than the sample bias. 

6 Conclusions 

Looking at the tests summarized in this paper, we conclude that the PUL method 
gives quite accurate estimates with SP-structured Feistel ciphers, especially for 
approximations with a single active s-box per round. With increasing number 
of rounds, the actual bias values follow the PUL estimates quite closely until 
the PUL estimates become much less than 2“ ^ . After that point the actual bias 
remains stabilized around 2~~~^ and does not get much lower. 

The experiments on RC5 show that the performance of the PUL method may 
not be as good with other kinds of ciphers. In the case of the RC5 approxima- 
tion tested, there is a considerable difference between the estimated and actual 
values even for small number of rounds. At certain cases, the bias is significantly 
higher than 2“ 5 even after the estimated values become lower than 2“ 5. The 
inaccuracy of the PUL estimates increases with larger block sizes, so even greater 
differences between actual and estimated values should be expected with 64- and 
128-bit blocks. 

We analyzed several other techniques for an alternative estimation method 
that would give more accurate estimates than the PUL method in general. Our 
attempts to obtain good estimators by statistical techniques from a random 
sample of plaintext blocks did not provide any useful results, especially when 
the inverse square of the bias is an impractically large amount for a sample size. 

The theory of correlation matrices provides some opportunities for an alter- 
native estimation method. By this theory, it may be possible to obtain improve- 
ments over the PUL method by using more than a single trail for bias estimation. 
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We gave an example of such an application on RC5. The method gave some im- 
provements over the PUL method. But eventually its estimates also fell far from 
the actual bias values with increasing number of rounds. The main reason for 
this deviation is that the number of trails that can be considered in bias esti- 
mation can be no more than a certain tractable number; but the number of all 
trails that contribute to the bias of an approximation increases exponentially 
with the number of rounds. 

Another theory used as an alternative method for bias estimation is the 
theory of linear hulls. In Section 15.21 we pointed out some problems with the 
current applications of this theory. The main problem with the current practice 
is that, the theoretical results on linear hulls regarding combining the bias of 
different linear trails is proven only for DES-like ciphers, whereas in practice 
these results are used for different kinds of ciphers (e.g., RC5, RC6, SAFER). 

We conclude that the PUL method is quite an effective method for bias 
estimation with SP-structured Feistel ciphers, especially for approximations with 
at most one active s-box at each round and with a bias considerably higher than 
2 “' 2 . It is an open problem to find an equally effective method for non-Feistel 
ciphers and for the ciphers with too many rounds for the PUL method to give a 
meaningful value. 
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A Chabaud-Vaudenay Theorem on Max. Non-Linearity 

Theorem 4 (in Chabaud-Vaudenay |4]) For K = {0, 1} and F : Rp K'^ , 



Af>-{3x2P -2-2 



{2P- 1 )( 2 - 



:P-1 _ 






1/2 



29 - 1 



where Ap = maxf,^o,a | |{a^ £ Kp '■ a'X 0 b-F{x) = 0}| — 



\Kr\ 



B Fundamental Theorem of Linear Hulls 

Theorem 1 (in Nyberg [Il4j 1 For X £ {0, 1}™, K £ {0, 1}^, F : {0, 1}™ x 
{0, 1}^ — *■ {0, 1}", if X and K are independent random variables, and K is 
uniformly distributed, then for all a £ {0, 1}*”, b £ {0, 1}" 

2-^ ^ \Px{a-X(Bb-F{X,k)=0)-^f = 

ke{o,i}‘ 

Y, \Px,K{a-X®b-F{X,K)®c-K = 0)-^\^ 

ce{o,i}‘ 

During a linear attack, the key K is a fixed parameter, so the bias of interest 
is the bias on the left side of the equation; i.e., \Px{a- X (Bb- F{X, k) = Oi — 

The summation on the right is the squared bias with a random- variable key, over 
all key masks c. For DES-like ciphers, this right-hand side summation can be 
turned into a summation of the PUL bias over linear trails (Theorem 2 in [l4|h 
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The key mask does not matter here since the key is fixed. 
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Abstract. We discuss measures of statistical uncertainty relevant to 
determining random values in cryptology. It is shown that unbalanced 
and self-similar Huffman trees have extremal properties with respect to 
these measures. Their corresponding probability distributions exhibit an 
unbounded gap between (Shannon) entropy and the logarithm of the 
minimum search space size necessary to be guaranteed a certain chance 
of success (called marginal guesswork) . Thus, there can be no general in- 
equality between them. We discuss the implications of this result in terms 
of the security of weak secrets against brute-force searching attacks, and 
also in terms of Shannon’s uncertainty axioms. 



A Introduction 

It has become popular folklore that any “uncertainty” in cryptology can always 
be measured by (Shannon) entropy. This belief likely traces back to a result of 
Shannon’s jlSIJ that entropy is the only quantity which satisfies a set of uncer- 
tainty axioms. While information theory continues to be of great importance to 
cryptology (see e.g. |10I17| and references therein), in this paper we challenge the 
less-rigorous folklore. Specifically we compare entropy with another eminently 
reasonable uncertainty measure, marginal guesswork, which we define as the op- 
timal number of trials necessary to be guaranteed a certain chance of guessing 
a random value in a brute-force search. Our main result is Theorem |T] below, 
which basically says that there can be no general relationship (i.e. no inequality 
can hold) between entropy and the logarithm of marginal guesswork. 

It is now well-established that there are a variety of different uncertainty 
measures important to cryptology. Recent scholarship suggests a hierarchy of 
inequalities surrounding entropy (see in particular Cachin’s summary of [31 Table 
3.1]), and counterexamples exist (see e.g. an important one due to Massey |S]) 
which show that many of these inequalities are not tight. Our result adds to 
this overall picture in a rather negative way: marginal guesswork, which we 
shall argue to be as meaningful a measure of uncertainty as any within the 
secret-guessing paradigm (see Sect. IB. 21 and Remark [T] below) , can not exist 
with entropy in any hierarchy of inequalities. This rules out even vague notions 
that entropy may uniquely measure uncertainty at some level of granularity. 

B. Roy and E. Okamoto (Eds.): INDOCRYPT 2000, LNCS 1977, pp. 67 H79I 2000. 

(c) Springer- Verlag Berlin Heidelberg 2000 
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In Sect. s we recall some basic definitions in order to precisely state our 
main result in Sect. 0 We conclude in section Sect. m with a discussion about 
why Shannon’s uncertainty axioms don’t apply in such an extreme way to the 
secret-guessing paradigm. 

B Preliminaries 

In this section, we recall the definitions and some basic properties of the two 
uncertainty measures studied in this paper: entropy and marginal guesswork. 
We assume throughout that X is a discrete random variable taking on values in 
set a = {xi,X 2 ■ ■ .}. Following both Shannon jTS] and Massey [9|, we assume 
no bound on the size of For our purposes however, we need only consider 
random variables which have finite support (only finitely many probabilities pi = 
P [X = Xi\ are nonzero) . It will be clear from the context whether occasionally 
^ itself should be assumed finite. 

Both uncertainty measures we consider are related to determining the value 
of X given certain oracles. Speaking loosely for now: 

— Entropy measures how difficult it is to determine X given single queries to 
multiple oracles which answer questions of the form, “Is X an element of 
subset C jr?”. 

— Marginal guesswork measures how difficult it is to determine X given mul- 
tiple queries to a single oracle which answers the question, “Is X = a;?”. 

It turns out that these two querying strategies amount to different ways of 
traversing a Huffman tree as discussed below. 

B.l Entropy: The Uncertainty of Description 
Definition. The entropy of X is defined by 

i 

where these and other logarithms are taken to be base 2. When there are only 
two nonzero probabilities, p and (1 — p), the binary entropy function is sometimes 
written H{p) = —plogp — (1 — p) log(l — p). 

Informally, entropy is a measure of redundancy of natural languages and 
other information sources. It is used extensively in the theory of communication 
systems to quantify the optimal use of bandwidth in noiseless and noisy channels 
|1I4| . Since plaintext redundancy limits cipher security, entropy is often useful 
in cryptology [T6IT71 . 



Defining Algorithm: Prefix Codes and Huffman Trees. A code tree is 
any binary tree for which the directions left and right are assigned binary digits 
0 and 1, respectively. Following the path from the root to a leaf £ determines a 
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unique codeword c(£) € {0, 1}*. The mapping c defines a prefix code because no 
proper prefix of a codeword is a codeword, and so arbitrary concatenations of 
codewords may be uniquely decodec0. Certain code trees correspond to optimal 
compression, because the average codeword length is minimal. 

For our purposes, a Huffman tree is a binary tree, all of whose non-leaves 
have two children, and whose leaves are arranged, from left to right, in order 
of non-decreasing distance from the root of the tree. Not every binary tree can 
be transformed into a Huffman tree by permuting its branches. Each Huffman 
tree with leaves W = {yi, . . . , y„} corresponds to a 1^-valued random variable 
Y with distribution qi = P [Y = yi] = where \c{yi)\ is the length of the 

codeword corresponding to ?/i, or equivalently the distance from the root of the 
tree to yi. For this random variable, the entropy is exactly the average codeword 
length, which is optimal. In other words, 

H{Y) = ^qi\c{yi)\ < Y^qi\c* {yi)\, 

i i 

for any other prefix code c* . 

That the converse is nearly true constitutes one of beautiful elementary re- 
sults of information theory |1I4| : every j2T-valued random variable X corresponds 
to a Huffman tree with leaves and its prefix code c : — s- {0, 1}* is optimal 

(called the Huffman code for X). Furthermore, H{X) is within a single bit of 
the optimal average codeword length, i.e. 

i 

Thus entropy characterizes optimal encoding, and so can be seen as a measure 
of how difficult it is to determine a random value X given oracles which tell us 
whether, “X an element of C ^?”. Indeed, by collecting into subsets those 
codewords which share common prefixes, we may formalize this characterization 
of entropy in the following algorithm. 

Algorithm |T] traverses the Huffman tree of X from its root to the leaf which 
corresponds to the value of X. The average time complexity of this algorithm 
is clearly just the optimal average codeword length and hence is within 1 bit of 
H{X). 

B.2 Marginal Guesswork: The Uncertainty of Searching 

Definition. For a random variable X, it is convenient and common to rear- 
range the probabilities pi = P [X = Xi\ into non-increasing order 

P[i\ > P[2] >■■■> P[n] > P[n+1] = • • • = 0, (1) 

^ The family of prefix codes does not constitute all uniquely decodable codes, but 
codes outside of this family are not relevant to this paper. 
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A recursive algorithm to determine the value of X given: (i). its Huffman code c : 
— > {0, 1}*, and (ii). oracles that answer whether c{X) has prefix w € {0, 1}*. The 
codeword corresponding to X is ultimately returned when the empty word is passed 
as the initial argument, i.e. X = Hfind(e). 

function Hfind(s): 
if s G then 

return c“^(s). 
endif 

if s • 0 is a prehx of c{X) then 
return Hfind(s • 0). 
else 

return Hfind(s • 1). 
endif 



where n is the least integer satisfying 0 - Thus if is the i-th most probable 
event, we have = P . The marginal guesswork (or when necessary, 

the a- marginal guesswork) of X is defined by 



Wa{X) = min 




Informally, marginal guesswork measures the maximum cost of determining 
the value of X when we wish to be guaranteed an appreciable chance of success 
(the probability a) in a brute-force search. In the case of searching for the secret 
key of a cipher given known plaintext-ciphertext pairs, it has been quantified 
under certain circumstances m- 

More generally, a plot of the cumulative probability of success as a function 
of the minimum number of trials (i.e. the transpose of the profile of Wa{X)) 
is essentially a measure of the non-uniformity of X which predates information 
theory by a half century. Indeed, the Lorenz curve of economics, which is a plot 
of cumulative wealth as a function of percentile group, has been widely used 
to quantify non-uniformity in the distribution of wealth [ZEU, and has deep 
connections to majorization in the theory of inequalities . 



Defining Algorithm: Optimal Brute-Force Searches. Many situations in 
computer security force an adversary to conduct a brute-force search for the 
value of X by enumerating some elements of and testing for a certain success 
condition. The only possible luxury afforded to the adversary is that he may 
know which events are more likely, and hence be able to search in order of 
decreasing likelihood. The necessary success condition is abstracted as an oracle 
which answers whether X = x, and often occurs in practice. For example, UNIX 
passwords are routinely guessed in this manner because for any candidate 
password x, it can be checked whether its hash f{x) exists in a table of such 
hashes (which is often available publicly). 
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In cryptology, the secret key X of a symmetric key cipher almost always 
comes along with such an oracle. The reason is that given a small amount of 
ciphertext, a candidate key can be used to decrypt and produce candidate plain- 
text. Most plaintext - whether compressed or not - has certain features which 
suffice to rule out false keys. But such oracles are not limited to symmetric key 
ciphers. Even elaborate multipass protocols which employ public key primitives 
can be subverted in order to yield an oracle of this form sufficient to break them 
m\- The safest bet for the cryptographer is to assume that the adversary has 
complete knowledge of the probabilities {pi} and will conduct any search opti- 
mally, i.e. in the order given by 0 . This suggests the following optimal brute- 
force search algorithm. Clearly, the probability of success of Algorithm IB.2I is 



Optimal brute-force search for the value of X which guarantees a chance of success of 
a. Assumes an oracle which answers whether X — x. The algorithm may not succeed, 
in which case 0 is returned. 

function Wfind(Q:): 

for i £ Wa(A’)} do 

if A = X[{\ then 
return X[i\. 
endif 
done 
return 0. 



P[i] — least costly algorithm to guarantee chance a. 

Notice also that since the Huffman tree of X is arranged, from left to right, 
in order of non-decreasing length (non-increasing probability). Algorithm IB. 21 
traverses the Huffman tree of X along the leaves from left to right. 



Guesswork and Optimal Exhaustive Searches. When a = 1, we shall 
call the optimal search of Algorithm IB. 21 an exhaustive search, because it will 
exhaust all possible elements of until it succeeds. In practice, not all brute- 
force searches are exhaustive. The search for passwords and is rarely exhaustive 
im, while the search for cipher keys is often exhaustive [5]. The choice of a in 
a brute-force attack depends, of course, on the goals of the adversary and the 
nature of the problem. 

The average time complexity of Algorithm IB. 21 in an exhaustive search is 
given by 






( 2 ) 
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which we shall call simply the guesswork of X . It is readily verified that the 
guesswork of X is the average of its a-marginal guesswork, over all values of a. 
Thus 



W(X)= [ Wa{X)da, 

Jo 



so that guesswork can be seen as the area under the cryptologic Lorenz curve, 
Wa{X). 

Guesswork has been used to study cipher security in mM- It was first 
introduced in cryptology by Massey 0 in a context similar to our own, and he 
obtained a bound 

Vh(X) > + 1 (3) 



when H{X) > 2. Massey didn’t give the quantity of Q a name, but in light 
of its Lorenz curve interpretation, it is not surprising that it also appeared in 
economics nearly a century ago as the crucial term in the Gini coefficient ITO81 . 



Remark 1. The guesswork W{X) is not the principle object of study here be- 
cause like any average it could easily obscure the region of greatest interest. 
Cryptanalytic adversaries often don’t care ifW{X) is intractably large, say 2^^®, 
when Wo . 001 is relatively small, say around 2®°. They might be content to 

carry out 1,000 separate and independent attacks, expending a total effort of 
about 2"^° guesses and be guaranteed a 63% chance of success (from standard 
calculus) . 

Thus the complete picture about the security of a secret X must include the 
entire profile of Wa{X) rather than merely its average W(X). □ 



B.3 When Marginal Guesswork and Exponentiated Entropy Agree 

When X is uniform on some subset of we get a kind of agreement between 
entropy and marginal guesswork given by, 

J/(A)«logw„(A). (4) 

This occurs in several interesting cases. 



Minimum Uncertainty. When X is deterministic, i.e. it takes on only one 
value with nonzero probability, it is easily seen that H{X) = 0 = logWa(A). 



Maximum Uncertainty. When X is uniform over finite it is easily seen 
that H{X) = logical, and Wa{X) = |'q;|.£'|]. Equation (|4|) follows for fixed a 
since H{X) = log \ = 0(log \ fX'\ + log(a)) = 0(log Wc(A)). 

^ Guesswork is sometimes called guessing entropy, though as far as we are aware, never 
without enclosing quotation marks. 
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Long Random Sequences (The Asymptotic Equipartition Property). 

Consider the random tuple X = (Yi, . . . ,Ym), where {Yi} are independent and 
identically distributed 1^-valued random variables for some = {yi,y 2 ■ ■ ■}■ 
The AEP | 1I4| informally tells us that for large m, X behaves as if it were 
uniform on a subset of 1^"*, and we would again expect to hold. 

The AEP is analogous to the law of large numbers and can be formally 
stated as follows. Suppose each Yi is drawn according to probability distribution 
q : ‘S/ — > R. Following |^, we define the (strangely self-referential) R- valued 
random variables g(Yi) by P [q{Yi) = q{yj)] = P [Yi = yj] = q{yj), and similarly 
g(Yi, . . . , Ym) = q{Yi) ■ ■ ■ q{Ym)- Note that q{Yi) takes on at most |i^| real values, 
namely the actual probabilities of Yi. Writing H{Y) = H{Yi) for any i, we have, 
— logg(Yi, . . . ,Ym)lm H{Y) in probability, as to — > oo (see [T] and cf. a)- 
Thus for every £ > 0, there is an to such that 



P 



-—logq{Yi,...,Y^) - H{Y) 

TO 



< £ 



> 1 — £. 



In essence, drawing a value of q{Yi, . . . , Ym) means drawing the probability of a 
tuple X = {Yx, . . . ,Ym). For fixed to and £, it is natural to divide the set of all 
\W\^ sequences into the typical sequences C for which 

< p [^ = (j/i , . . . , y^)] < 2 — 

and the remaining atypical sequences. Thus it is clear that we recover for 
sufficiently large to, H{X) « mH{Y) « mH[Y) -b log(a) « logu'a(A). 

Evidently, random sequences of i.i.d. values constitute special random vari- 
ables for which entropy and (logarithmic) marginal guesswork essentially agree 
as in g). However as we shall see in the sequel, no equivalent of the AEP must 
hold for a general random variable. 



C The Main Result 

It is easy to see that there are random variables for which is much greater 

than Wa{X)\ simply choose p[ij = a and all remaining probabilities sufficiently 
small. This is essentially the example used by Massey [S] to show that the guess- 
work inequality of m is not tight. The details for marginal guesswork are given 
in the proof to Theorem [T] below. 

The following lemma shows that the opposite can happen as well. The cu- 
mulative probability of the most likely 2^^^^ events can be arbitrarily close to 
zero. 

Lemma 1. For any real £ > 0, there exists a random variable X with finitely 
many nonzero probabilities pi = P [X = Xi], such that 

PW < £■ 
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Proof. We define a family of random variables Xj^k parameterized by integers j 
and k. Specifically for each j, k > 1, write a = 2-1 and define Xj^k by the sequence 
of probabilities a, a~^, . . . , a~^, followed by m copies of the last value a“^, 
where m must be chosen so that the probabilities add to 1. These probabilities 
come from self-similar Huffman trees as discussed in Fig. [I] below. It is easy to 
show that m must be given by 

1 -h (a - 2)a'= 

m = ; . 

a — 1 

Now let us examine the entropies of this family. 



Hj,k = H (Xj^k) = '^ — ^og o* 



where 



2=1 



l-h (a-2)a'= 



a — 1 






l-h(a-2)a'' 



(a — l)a^ 



= J 



2=1 

Qfc+i — (fc + l)a J- k 
2/-,k 



(a — l)^a 



-jk 



1 J- (a — 2) 



-loga'^ 

1 ft - 



,kl 



(a — l)c 



= J 
= jk 



ka'^+'^ -h (1 - 3k)a^+^ + 2/co'= - 



= (a- 1)2 



a — 2 
a — 1 



y,k, 



kj^k — 



- 1) 



— 1)^ 

Let us fix a lower bound 2 < j so that a > 4 as well. Then it is easy to see that 

' a — 2'’ 



jk 



0—1 



> log k, 



and thus that 21^^.'^! > 2^^.*^ > > A:. In this case, we may calculate the 

cumulative sum 



P[i] 

i=l 









i=l 



23 - 1 






where 




- fc) - 1 
o^(a — 1) 



We claim that for fixed j, aj^k —> 0 as k oo. If that were true, then for any 
e > 0, we may fix a j > 2 such that 
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and find a k such that Sj^k < £• Thus the proof would be complete. 

We now turn our attention to finding the limit of aj^k as fc — > oo. For some 
k{j), hj^k < 1 for all k > k{j), because hj^k ^ j /a as k ^ oo. Thus for k > k{j), 

\Hj^k~\ ^ jk ^ ^ ^ ^ ^ + 2, 

allowing us to give the following upper bound on Uj^k- 

(a - l)(4/3'= - fc) - 1 

— 77 77 I 

a^(a — 1) 

where /3 = 2-^(“7rr) < a. Finally, by two applications of L’Hospital’s rule we see 
that aj^k ^ 0 as fc ^ oo. □ 

Remark 2. Each random variable Xj^k in the proof of the previous lemma cor- 
responds to a self-similar Huffman tree, tj^k- Fig. [T] shows ^ 2 , 3 - The trees are 
self-similar in that each tj^k contains subtrees isomorphic to tj^i^ for all i < k. 
The self-similarity is also characterized by a zig-zag pattern of the path starting 
from the root and going alternately through the maximal proper codeword pre- 
fixes and the roots of the trees isomorphic to tjy, for i < k. □ 



^ 2,3 




Fig. 1. The random variables in the proof of Lemma [T] correspond to self-similar 
Huffman trees as described in Remark [2] 



Our main result formalizes the notion that there are random variables for 
which H{X) and logWct(^) could be arbitrarily far away from one another. In 
other words, to determine a secret, the effective search space size - measured in 
bits - could be arbitrarily more or less than the entropy of that secret. 

Theorem 1. For each 0 > a > 1 and every integer N > Q, there are random 
variables X and Y with finite support satisfying log?Ca(X) — H{X) > N , and 
H{Y) -\ogWa{Y) > N. 
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Proof. First we find X. Write H = H{X) and choose the e of Lemma [T] satisfying 
e = aj2^ . Then 

^ P[i] < 2^ ^ p[i] < 2^e = a. 
i—1 i—1 

We conclude that Wa{X) > 2^2^^! > 2^+^, and thus that log?Ua(X) — iL > N. 

We now turn to finding Y . Consider the probabilities qi = P [Y = yi] given by 
g[i] = a followed by q[i] = (1 — 2 < i < 2^ + 1. In other words, the same 

smallest probability is repeated 2^ times. This corresponds to an unbalanced 
Huffman tree as in Fig. It is easy to see that Wa{Y) = I, and 

H(Y) = — aloga — (I — a) log = (1 ~ o;)k + H(a). 

2« 

The desired result is obtained when H{Y) > N , which is easily achieved if we 
choose 

1 — a 

□ 




Fig. 2. An unbalanced Huffman tree demonstrating H{X) logWc«(A), as in Theo- 
rem [T] 



D Discussion: Shannon’s Axioms 

Any measure of uncertainty about X should indicate the difficulty of determining 
the value of X under a certain set of rules. In fact, entropy and marginal guess- 
work are the average and worst-case time complexities of querying strategies as 
formalized in Algorithms [I] and IB. 21 respectively. Furthermore these algorithms 
represent two optimal ways of traversing the same combinatorial object, namely 
the code tree corresponding to the Huffman code of X. 

While there are a variety of cases where they agree, we have seen that H{X) 
is not, in general, a good indicator of logWct(X). Yet Shannon’s uncertainty 



On the Incomparability of Entropy and Marginal Guesswork 



77 



axioms along with a great deal of folklore seem to admit a single, unique (loga- 
rithmic) uncertainty, namely entropy. Let us recall these axioms, one or more of 
which must be inconsistent with the guessing paradigm for determining a secret. 
Paraphrasing from |16| . an uncertainty measure U must satisfy (cf. |ll4p : 

1. U{X) must be continuous function of the probabilities pi = P \X = Xi]. 

2. For uniformly distributed X, U{X) must be a monotonically increasing func- 
tion of I JT|. 

3. {grouping axiom) Suppose ^ is decomposed into mutually disjoint events 
Uj C JT, 1 < j < TO. Let W = {yi, . . . , ym} and define the <3^ -valued random 
variable Y by qj = P [P = yj] = P[X G yj]. The uncertainty U{X) must 
satisfy 

m 

U{X) = U{Y) + Y.^,U{X\y,), 
i=i 

where for each j, U {X\yj) is the uncertainty corresponding to the a posteriori 
probabilities given by p{xi\yj) = Pi/qj if Xi G yj, and 0 otherwise. (Note that 
for entropy, this axiom is a special case of the chain rule of information theory 
01 : H{X, Y) = H{Y) + H{X\Y).) 



Even though logWct(-^) is not continuous, it could be replaced by a similar 
continuous function, so axiom 1 is really not a problem. Since log|"a|.^|] is 
monotonically increasing, axiom 2 holds for logarithmic marginal guesswork. 
However, we claim that the grouping axiom 3 cannot hold for logWa(^)- If we 
assume otherwise, we must have 

= n (5) 

i=i 

But let 3^ be the leaves of a Huffman tree T and let W be the leaves of any 
proper Huffman subtree of T with the same root. As described in Sect. EH we 
get 3^ and W -valued random variables X and Y ^ whose distributions are related 
to the various codeword and prefix lengths, respectively. In the case a = 1/2, we 
can compare m with an exact expression because wi{X) is just the number of 
leaves in the left half-tree, and thus 



w^{X) 



1 

w^(Y) 



w 1 (y) 
i=i 



(6) 



There are obvious cosmetic differences between © and The arithmetic mean 
of (01) is not necessarily the geometric-like mean of (|5]l , the number of terms is dif- 
ferent, and the weighting factors aren’t necessarily the same, i.e. qj ^ l/wi{Y). 
But we hold that the heart of the disparity is the fact that the individual values 
being “averaged” have essentially no relation to one another. For sufficiently 
large trees, it is easy to construct examples (e.g. using Fig.0|on branches) where 
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the wi{X\yj) of (SD is arbitrarily less than the wi{X\yj) of By carefully 
choosing which yj to introduce the disparity, we can drive the r.h.s.’s of (E) and 
(El arbitrarily far away from each other - in either direction. 

Shannon’s grouping axiom cannot hold for logWQ(Jf), not even as an in- 
equality. Yet we have argued that marginal guesswork is a reasonable and useful 
measure of uncertainty. We must conclude that the axiomatic premise which 
can serve to identify uncertainty as entropy, is inappropriate for characterizing 
the uncertainty of a secret against an adversary who is equipped to conduct a 
brute-force searching attack. 
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Abstract. Twofish is one of the finalists for the Advanced Encryp- 
tion Standard selection process (AES). The best up-to-date analysis of 
Twofish was presented by one of its designers, who showed that its com- 
plexity on 6-round Twofish with 128-bit keys is 4.6- 2^^® one-round com- 
putations. In this paper we present an improvement of the attack on 6- 
round Twohsh, whose complexity is 1.84 • 2^^® one-round computations. 
For other key sizes, our results have the complexity 13 • 2^®° one-round 
computations for 192-bit keys and 24.2-2^®^ one-round computations for 
256-bit keys. For 7-round Twofish, the designers mentioned an attack, 
and estimated its complexity to be about 2^®® simple steps for 256-bit 
keys, and for other key sizes have complexities that exceed exhaustive 
search. We present an improvement of the attack on 7-round Twofish, 
whose complexity is 2^^® '*® one-round computations for 256-bit keys. We 
also show, for the first time, that this attack is faster than exhaustive 
search for 192-bit keys for which it breaks 7-round Twofish in 2 • 2^®^ 
one-round computations, while exhaustive search takes 7 • 2 ^^^. 



Keywords: cryptanalysis, impossible differential, Twofish. 



A Introduction 

Twofish jl] is a block cipher designed by Counterpane Systems Group as a 
candidate for the Advanced Encryption Standard selection process, and was 
accepted as one of the five finalists. 

The best up-to-date attacks on Twofish breaking 6 rounds for all key sizes 
(128, 192 and 256) and 7 rounds for 256-bit keys only were presented by Ferguson 
in 1^. They use a 5-round impossible differential (see I1I3I for more details on 
attacks using impossible differentials). In this paper we present an improvement, 
based on the same 5-round impossible differential and an additional 4-round 
impossible differential. Our improvement reduces the total complexity of the 
attack on 6-round Twofish for all key sizes and on 7-round Twofish for 256-bit 

* This work was supported by the European Union fund IST-1999-12324 - NESSIE. 
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keys. In addition, we show, for the first time, that the attack on 7-round Twofish 
for 192-bit keys is efficient as well. 

Note that Ferguson made the full analysis only for 6-round Twofish with 
128-bit keys and for other cases gave only his estimations about the complexity 
of the attacks. All results of Ferguson presented in this paper are computed by 
us, to allow fair comparison of his and our methods. 

Our paper is organized as follows: Section iBl describes the attack on 6-round 
Twofish with 128-bit keys. Section o describes the attacks on 6-round Twofish 
with 192-bit and 256-bit keys. Finally, Section El describes the known plaintext 
attacks on 7-round Twofish with 192-bit and 256-bit keys. 

B The 6- Round Attack on Twofish with 128-bit Keys 

Twofish is a 16-round block cipher with pre- and post- whitenings. Figure [I] 
outlines round i of Twofish, where 




Fig. 1. The round i 



— (a*, 6®, c\ d®) is the 128-bit intermediate data after round i, 

— (AT®’^, AT®’^) is the subkey of round i, 

— The S-boxes are key dependent, 

— MDS is a linear operation defined by a constant matrix, 

— PHT is a layer that mixes two 32-bit words x, y hy x' = x + y, y' = 2 ■ y + x, 

— (r:®’^, is the 64-bit output of the PHT in round i. 

More detailed description of Twofish may be found in [1|. The following analysis 
is performed on an equivalent variant of Twofish without the one-bit rotations 
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of the right half of the data. In [2] the author shows the equivalence between 
this variant of Twofish and the original Twofish variant. 

First, we describe the detailed analysis of Ferguson, as it is shown in [^, and 
then we describe our detailed analysis. 

B.l Ferguson’s Analysis 

We know the 5-round impossible differential 

5 rounds 

(o,o,^i,52) ~h (<5i, ^2, 0, 0), 

where (^ 1 ,^ 2 ) ^ (0;0) [denoted in the equivalent variant]. 

Using this impossible differential a 6-round Twofish is attacked as follows: 
Look for plaintext pairs with a difference (0, 0, <5i, 52 ), where (^i,<52) ^ (0,0), 
that have a difference (hi, ^ 2 , 0, 0) after the 5th round when decrypting by some 
guessed subkey of the last round. If such a pair is found, the corresponding 
guessed subkey cannot be the correct subkey, because the differential is impossi- 
ble. Therefore such a subkey is certainly wrong, and can be discarded from the 
list of the possible subkeys. If a sufficient number of such pairs are analyzed, most 
subkeys can be discarded, and the remaining keys can be exhaustively searched. 
A more detailed description of the attack is: 

1. Fix a 64-bit value (a°,6°), and let (c°,<i°) take all the 2®^ possible values. 
Note that for this attack only a single such structure is needed. 

2. Ask for the ciphertexts (of , , cf , df) of the 2®"^ plaintexts (a°, 6°, c°, d°), and 

insert them in a hash table indexed by (c° 0 cf , d° 0 d®). 

3. Any pair of encryptions without the same (c° 0 c®,d° 0 df) cannot make 
the differences before the last round to be as required by 5-round impossible 
differential. These pairs are ignored, and only the pairs of encryptions with 
the same (c° 0 cf , d° 0 df) are analyzed. As there are 2^^”^ pairs in total and 
64-bit restrictions, the expectation is that about 2®^ pairs remain. 

4. The 64-bit S-box key S is guessed and the values (from cf, df) are 

computed for each plaintext in the structure. Thus, for each remaining pair, 
the situation is as described in Figure |2] where the values Vj’^, 

and the differences of 0 of, &f 0 are known. The subkeys (A®’^, A®’^) 
that satisfy the described requirement can be easily found. There is one such 
subkey (A^’^, A®’^) on average for each analyzed pair and each S-box key S 
guessing. 

5. Each remaining pair eliminates a fraction of 2“®^ of the possible keys for 

the last round. For each last round subkey there exists one original key on 
average. Thus, a single structure of 2®'* plaintexts reduces the number of the 
keys to (1 — 2“®^)^ = l/-\/e = 0.6 of the original number. 

6. Test the remaining keys exhaustively. 

The complexity of this attack is as follows: 2^^® times (2®^ possible S-box keys 
and 2®^ plaintexts) one-round computations are spent, which reduces the number 
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v f > ffl ^ (af © a®) «< 1 

> m &f©fef 

7 ^ 6,2 

Fig. 2. r/ie equation for finding last round subkeys, that are wrong due to 5-round 
impossible differential 



of the keys to 0.6 of the original number. So, there remain 0.6 • 2^^® undiscarded 
keys, whose exhaustive search takes 6 • 0.6 • 2^^® one-round computations. In 
total, the complexity is 2^^® + 3.6 • 2^^® = 4.6 • 2^^® one-round computations, 
while an exhaustive search takes 6 • 2^^® one-round computations. On average, 
these values are 2.8 • 2^^® and 3 • 2^^® respectively. 



B.2 Our Analysis 

We use the 5-round impossible difference, 

5 rounds 

(0,0, 51 ,^ 2 ) ~h (<5i, ^ 2 , 0 , 0 ), 

as does Ferguson, together with the following additional 4-round impossible dif- 
ferential: 

A. 'T' /~i'} I 71 Q 

(0,0,61,62) h (6z, 6^,6^, 62), 

for any ^3,(54 such that (^3,<54) yf (0,0)0. 

Similar to the earlier analysis, for each pair of plaintexts (a°, c°, d°) and 

(oj, bj, Cj, dj) with a difference of (0, 0, 5i, ^ 2 ), where (<5i, ^ 2 ) ^ (0, 0), we ask for 
the ciphertexts (af,bf,cf,df) and (aj,bj,Cj,dj) respectively. 

We continue the analysis for each pair for which one of the following condi- 
tions is satisfied: 

— (B = Cj (B Cj and 0 df = 0 d® (we can use the 5-round impossible 

differential). This is the case studied in the earlier analysis. 

— c° 0 cf = c° 0 c® and d° 0 d® = d° 0 d® 0 2®^ (we can use the 4-round 
impossible differential) . 

— c° 0 cf = c® 0 c® 0 2®® and d® 0 df = d® 0 d® (we can use the 4-round 
impossible differential) . 

— c° ® = c° ® ® 2®® and d° ® d^ = d° ® d^ ® 2®® (we can use the 4-round 

impossible differential) . 



® If (^ 3 , 54 ) = (0, 0) we get back to the 5-round impossible differential. 
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The last three cases extend the earlier analysis using the 4-round impossible 
differential. 

Thus, in the 5th round, the output difference of the F function is either: 
(0,0) (first condition), (0,2^^) (second condition), (2^^,0) (third condition) or 
(231,231) (fourth condition). All these differences are preserved when they pass 
through addition with the 5th round subkey so the output difference 

of the PFIT in the 5th round is: (0,0), (0,2^^), (2^^,0) or (2^^, 2^^), respectively. 
The PHT affects these differences in a predetermined way, thus the output 
differences of the g functions in the 5th round (i.e., the input difference of the 
PP[T) are (0,0), (2^^, 2^^), (0,2^^) and (2^^,0), respectively. The AIDS matrix 
is linear, so we can compute the output difference of each S-box in the 5th round 
(by multiplication of the AADS~^ with the output difference of the g function). 
So we know the output difference of each S-box in the 5th round. 

We then guess the S-box key S and compute the values (from c®, 

d®) for each plaintext in the structure. Thus, for each remaining pair, we get the 
situation, described in Figure [21 in which we know of, 



r,6,2 





Fig. 3. The equation for finding last round subkeys, that are wrong due to f-round 
impossible differential 



a®, bj, the S-box key S and the output differences of all the S-boxes. The only 
information that we do not know in Figure |3]is the 6th round subkey (AT®’^, AT®’^) 
that would satisfy these restrictions. Appendix |A| shows how we can easily find 
it by one operation on average. 

A more detailed description of the attack is: 



1. Fix a 64-bit value (a°,6°), and let (c°,d°) take all the 2®"^ possible values. 
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2. Ask for the ciphertexts c®, df) of the 2®"^ plaintexts (a°, 6°, c°, d°), and 

insert them in a hash table indexed by ((c°0cf) mod 2^^, (d°0d®) mod 2^^). 

3. We have in total 2^^^ pairs and 62-bit restrictions, so we expect to get about 
2®® correct pairs (satisfying the condition described earlier in this section) . 

4. We guess the 64-bit S-box key S and calculate the supposed for 

each plaintext from the structure. 

5. Each analyzed pair eliminates a fraction of 2“®“^ of the possible keys for 

the last round. For each last round subkey there exists one original key on 
average. Thus, a single structure of 2®“^ plaintexts reduces the number of 
keys to (1 — 2“®^)^ = 1/e^ = 0.14 of the original number. 

6. Test the remaining keys exhaustively. 

The total complexity is 2^^® (2®"* plaintexts ■2^'^ S-box keys) 06 • 0.14 • 2^^® 
(an exhaustive search of 0.14-2^^® keys through 6 rounds) = 1.84-2^^® one-round 
computations and 2'^^ preprocessing computations. 

Actually, we can use a smaller structure of 2®® plaintexts. In this case, we 
get about 2®® pairs, satisfying the condition described in the beginning of this 
section. Thus, a single structure of 2®® plaintexts reduces the number of undis- 
carded keys to (1 — 2“®'^)^ = 1/y/e = 0.6 of the original number. Hence, the 

total complexity is 2^^^ 0 6 • 0.6 • 2^^® = 4.1 • 2^^® one-round computations and 
2^^ preprocessing computations. This is still a smaller complexity than Ferguson 
describes, despite the fact that we use a smaller number of plaintexts. 

We observe that we can enhance the attack using several structures of 2®"* 
plaintexts. This enhancement can be applied both to our variant of attack and 
to the variant of Ferguson. Denote the number of structures used in the attack 
by p. Then, the total complexity of the attack is 

— Based on our method: p ■ 2^^® 0 6 • e~^'P ■ 2^^®. 

— Based on Ferguson’s method: p • 2^^® 0 6 • • 2^^®. 

Figure m shows the relationship between the total complexity and the number 
of the structures for Ferguson’s method and for our method (the 1/2 structure 
point refers to the small half-size structure). 

C The 6-Round Attacks on Twofish with 192-bit and 
256-bit Keys 

The attack on 6-round Twofish, described earlier, works in a similar way and 
with a similar reduction in the complexity when one structure is used and the 
key size is either 192 bits or 256 bits. However, for these cases, the same attack 
used in conjunction with several additional structures can further reduce the 
total complexity of the attack. 

C.l 192-bit Keys 

The total complexity of the attack on 192-bit keys variant using p structures is 
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Fig. 4. The total complexity of attack on 6-round Twofish with 128-bit keys 



— Based on our method: p ■ + 6 • e ■ 2^®^. 

— Based on Ferguson’s method: p ■ 2^®® + 6 • • 2^®®. 

Figure |5] shows the relation between the number of structures used in the attack 
and the total complexity of the attack for the attack presented by Ferguson and 
for our improvement. From Figure [S] we can see that our best result is achieved 
given 12 structures (12 • 2®"* chosen plaintexts), and the complexity in this case is 
about 2^®®-^ one-round computations. Figure E also shows that the complexity 
of our analysis is increased when more than 12 structures are used (and similarly 
for more than 48 structures with Ferguson analysis) . This phenomenon is due to 
the larger number of chosen plaintexts, which cause the time required to discard 
the wrong keys to be longer than the time required to exhaustively search the 
remaining keys. 

C.2 256-bit Keys 

The total complexity of the attack on 256-bit key variant using p structures is 

— Based on our method: p ■ 2^®^ -F 6 • • 2^®®. 

— Based on Ferguson’s method: p ■ 2^®® -F 6 • • 2®®®. 

Figure E is similar to Figure E for the case of 256-bit keys. Our best result in 
this case is about 2^®® ® one-round computations using 24 structures. 

Table E compares our best results with Ferguson’s best results (computed by 
us, as noted in the introduction) for each possible key size. 

Note that these attacks are preserved when the 6-round variant of Twofish 
includes pre- whitening. 
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Fig. 5. The total complexity of attack on 6-round Twofish with 192-bit keys 
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Fig. 6. The total complexity of the attaek on 6-round Twofish with 256-bit keys 
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Key size 


Our results 


Ferguson’s results 


in bits 


complexity 


chosen plaintexts 


complexity 


chosen plaintexts 


128 


1.84 • 2"’"“ 


2^3 


4.6 • 2"®® 




192 


13 • 2^®° 


12 • 2®^ 


48.6 • 2^®“ 


47 • 2®^ 


256 


24.2 • 2^®^ 


24 • 2®^ 


93 • 2^®® 


91 • 2®^ 



Table 1. Comparison between ours and Ferguson’s best results 



D The 7-Round Known Plaintext Attacks on Twofish 
with 192-bit and 256-bit Keys 

The 7-round attack is based on the same impossible differentials as the 6-round 
attack. In this attack we use these impossible differentials in the middle rounds 
(starting from the second round), in contrast to the 6-round attack, in which 
the impossible differentials start in the first round. 

It is convenient to consider this attack as a variant of the 6-round attack, in 
which an additional round is prepended before the 6-round cipher. We want to 
get the difference (0,0,5i,(52) before the second round for some (5i,<52) ^ (0,0). 
Therefore, the difference before the first round must be (^i, <52, ci, £ 2 ), where 
{^ 1 ,^ 2 ) 7 ^ (0,0). In addition, the right half of the difference after the seventh 
round must be one of the following values: ((5i,(52), (5i © 2^^, ^ 2 ), (<5i,52©2^^) or 
(5i©2^^, ^2©2^^) (otherwise we cannot use the impossible differentials). Hence, a 
pair of plaintexts may be used for this attack with probability 2“®^ (the left part 
of the input difference must be equal to the right part of the output difference, 
except for their most significant bits). We call such a pair a matching pair. 

For every matching pair, we guess the S-box key S and find the subkeys 
of the first and the last rounds, that leads to the impossible differential in the 
middle rounds. Such subkeys are wrong, and thus the combination of the initial 
keys that lead to these subkeys together with the guessed key S is wrong as 
well. We discard these keys from the list of the possible keys. Each matching 
pair eliminates about fraction of the possible keys. Therefore, we need 

about 2^^® matching pairs, i.e., about 2^®° (2^^® • 2®^) pairs in total, which can 
be generated using 2®® ® known plaintexts. 

In [2j, Ferguson maintains that such an attack (using a 5-round impossible 
differential, of course) may be used only when the key size is 256 bits and it 
takes about 2^®® steps. In Subsection ID. II we show that such an attack is more 
efficient than exhaustive search even when the key size is 192 bits, and even if we 
use only a 5-round impossible differential, as Ferguson does. In Subsection ID. 21 
we show that the complexity of the attack on 256-bit key variant is significantly 
less than 2^®®. 

D.l 192-bit Keys 

As in the previous attack, the analysis consists of two stages: discarding most 
keys and exhaustively searching the remaining ones. We denote the number of 
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the known plaintexts required for the attack by 2^. Clearly, p must be close to 
95.5 (according to the above explanation). So, the first stage of the analysis takes 
(known plaintexts) -2®® (guessing of the S-box key S) -2 (for the first and the 
last round) = 2^^+®®+^ = 2^+®^ one-round computations. It discards most keys 
as wrong, except for a fraction of (1 — 2“^®®)® ^ ** = e“® ^ ^ 

of the keys (where 2® ^“^ pairs are received from 2^ known plaintexts and the 
probability of a pair being a matching pair is 2“®®). Hence, the second stage 



of the attack takes 7 • e 



_22-p-1 



219® one-round computations (where 7 is the 
number of rounds and e“® ^ ^ ■ 2^®® is the number of the remaining keys, as 
described earlier). The total complexity of the attack is 2^+®^-|-7 -6“® • 2^®® 

one-round computations. 

We can improve this result by observing the symmetry of the cipher: We 
set the 4-round impossible differential in the decryption direction from round 6 
backwards and use a similar analysis replacing the first and last rounds. This 
improvement allows us to discard twice as many keys using the 4-round impossi- 
ble differential (used in 3/4 of the cases), or a total of 7/4 of the number of keys 
discarded by each pair in the first stage of the attack. So it reduces the fraction 
of remaining keys to (1 — 7/4 • 2“^®®)® ® ^ 

Hence, the total complexity of the attack is 2^+®’^-|-7- e""^ ® ^ 
computations. 

If the attack uses only the 5-round impossible differential (as in Ferguson’s 
method), then the probability for some pair to be a matching pair is 2“®^, 
and the total complexity of the attack is 2^+®^ + 7 • e“® • 2^®® one-round 

computations. 

Figure [3 shows the total complexity of the three variants of the attack: 



« q2-p — 193 

= e ' 

•2^®® one-round 



1. Using only the 5-round impossible differential - the best result requires 4.95 • 
2^®® one-round computations using 2®® ®^ chosen plaintexts. 

2. Using the 5-round and the 4-round impossible differentials - the best result 
requires 2.79 • 2^®® one-round computations using 2®®-®^ chosen plaintexts. 

3. Improved result using the 5-round and the 4-round impossible differentials 
- the best result requires 2 • 2^®® one-round computations using 2®® ® chosen 
plaintexts. 

Note that an exhaustive search takes 7 • 2^®® = 2^®^ ®^ one-round computations. 



D.2 256-bit Keys 

In this case, the analysis is similar to the previous one. The first stage of the 
analysis takes 2^ ■ 2^®® • 2 = 2^’+^®®+^ = 2^’+^®® one-round computations. This 
reduces the fraction of undiscarded keys to (1 — 2~^®®)®^ p i 62 _ p es i 2 s _ 

e“® ^ . Hence, the second stage of the attack takes 7- e“® -2®^® one-round 

computations. The total complexity of the attack is 2^*+^®® -|- 7 • e“® ^ ** • 2®®® 
one-round computations. In the improved analysis, the total complexity of the 
attack is 2^*+^®® -I- 7 • e“^ ® ^ ■ 2®®® one-round computations. In the attack that 

uses only the 5-round impossible differential the total complexity is 2^^+^®® -|- 7 • 
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Fig. 7. The total attack complexity for 7-round Twofish with 192-bit keys 



^_ 22 p 193 ^ 2256 one-round computations. Note that an exhaustive search takes 
7 . 2^^® = one-round computations. 

Figure |S] shows the total complexity of all these cases, and its best results 
are shown in Table E] 



one-round known 
computations plaintexts 

5-round imp. diff. only ^ ^ 

5-round and 4-round imp. diff. 2®'^ ®® 

Improved using the 5-round and 4-round imp. diff. 2 ®'^ '^® 

Table 2. The best results for the attack on 7-round Twofish with 256-bit keys 
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A The Subkey Computation in a Unit of Time 

From Figure 0 we can see that the and have independent computa- 
tions. Thus, we can compute and in parallel KT®’^. Without loss of gener- 
ality, we show how to find the supposed values of KT®’^, and KT®’^ can be found 
similarly. 

As we discussed earlier, the 32-bit MDS output difference may be either 0 
or 2^^. In the first case (a difference 0), the 32-bit MDS input difference is 0 as 
well. It means that the 8-bit output differences of all the S-boxes are zeroes, and 
thus the 8-bit input differences of all the S-boxes are zeroes as well. So we get 
back to the situation described in Figure |2| for one of the words (rather than for 
both words as shown there). This information suffices to find all the matching 
subkeys directly. 

In the other case the 32-bit MDS output difference is 2^^. The 32-bit MDS 
input difference is some fixed value that may be calculated by multiplying 
MDS~^ by 2^^. We denote the received 8-bit output differences of the S-boxes 
by Aso, Asi, As 2 and Ass- During the attack, when we are looking for iF®’^, 
we know a®, a®, Aso, Asi, As 2 , Ass and S-box key S, but it still 

takes 2®^ guesses to find the subkeys matching the condition. A more efficient 
method is to use a precomputed table, which reduces the time needed during 
the attack. 
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We consider the conditions on the right hand side of the (XOR with a® 
and S-boxes), as shown in Figure[3l Before the attack we don’t know the of, a® 
and S-box key S', but we know the Ziso; ^Sh ^S 2 and Ass- We want to find all 
possible pairs of 32-bit values that may lead to the difference (/\so, Asi, As 2 , 
Asa) for some of, a® and S-box key S. For each 32-bit difference there exist 2^^ 
pairs satisfying the difference. We guess a S-box key S and values af, a®, and 
build a table which, for any such guessing, gives all possible input pairs, that 
together with guessed values lead to required output difference (Ziso) Asi, As 2 , 
Asa)- To do it in such a way takes 2^^ (possible output pairs) -2®^ (possible 
S-box key S guessing) •2®'* (possible a®, a® values) = 2^®° computations, that 
is too much for the 128-bit keys case. But the S-boxes are independent and 
the XOR operation of 32-bit values may be represented as four independent 
XOR operations of 8-bit values, so we can make the above computations in four 
independent computations, on 8-bit values each one. Thus we build four tables, 
and each of them takes 2® (possible output pairs) -2^® (possible S-box key S 
guessing) -2^® (possible of, a® values) = 2'^° computations, that takes in total 
4 • 2"^® = 2"^^ computations. In addition, we hash the received input pairs by their 
additive difference. This reduces the time taken during the attack to find the 
K^’^, because when we know the and we may calculate their additive 
difference which is preserved when it passes through the addition operation. 

During the attack we know the values: a®, a®, S-box key S and additive 
difference of the and Vj’^, so we get the possible pairs from the four tables, 
and starting from the least significant 8 bits to the most significant 8 bits we get 
There is one such subkey on average, so it takes one operation on average. 

Note that the fact that an additive difference may have a “carry” does not 
disturb us. If there is an input carry to some 8-bit additive difference, we just 
consult the table with the original additive difference minus one. 
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Abstract. Most electronic cash (e-cash) based payment systems that 
have been proposed do not possess the property of transferability. Trans- 
ferability in an e-cash based system means that when a payee receives an 
electronic coin in a transaction he may spend it without depositing the 
coin first and getting a new coin issued from a bank. Usually electronic 
coins that are transferred in a transaction have a lifetime of the trans- 
action itself. In this paper, we propose a payment system where coins 
can be transferred over multiple hands, spread over various transactions, 
similar to physical cash. Detection or prevention of double spending of 
coins is a critical issue in online e-cash payment systems. In our system 
the verification is distributed across multiple entities as opposed to the 
case of a coin-issuing entity or a central bank alone being responsible 
for the verification. A resolution mechanism for handling disputes is also 
presented. The proposed system provides guarantees of anonymity, fair- 
ness and transferability. 

Key words: E-cash, Transferability, Fairness, Anonymity, Double Spend- 
ing. 



A Introduction 

With the exponential increase in the number of users on the Internet and the 
market place shifting to the Internet, the role of an electronic payment system is 
crucial. Anonymity, transferability, fungibility (breaking large denominations to 
smaller ones) are some of the properties of paper cash, which an electronic pay- 
ment system should possess. From security considerations, cryptographic primi- 
tives are used in electronic payment systems. If a payment system is to succeed 
on the Internet then the computational efforts in using these primitives need to 
be optimized. 

In this work, we consider electronic payment systems in which the payment 
instrument is e-cash. Such systems are classified into two types, viz, online and 
offline. Online e-payment systems are those in which the transfer of electronic 
money between the payer and payee takes place in the presence of a third party, 
usually a bank, that guarantees the authenticity of the coins being transferred. In 
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contrast, in offline systems the transaction occurs between the two parties, payer 
and payee, alone. The money transferred is verified when the payee deposits the 
coins with a bank. Transferability of coins is a missing feature in most of the 
systems that have been proposed so far, whether online or offline. The lifetime 
of a coin is the lifetime of the transaction it is involved in, in such payment 
schemes. This is in contrast to paper cash where the money retains its value 
over several transactions and merely changes hand. An obvious advantage of 
transferable cash is that a coin-issuing authority need not issue new coins for 
every transaction that takes place. 

A disadvantage, that has often been pointed out, of online systems is the 
unfavourable load that would be placed on the central bank or coin-issuing au- 
thority that would check the authenticity of the coin. This is certainly a bot- 
tleneck since each coin in a transaction needs to be verified by a central server. 
In the proposed scheme, we do away with a central verifying server and balance 
the load across several entities. We also require a dispute resolving mechanism in 
place for the electronic payment system. Given the fact that the payer and payee 
do not even know each others’ real identities as they transact over the Internet, 
the payment system should be able to give guarantees to both the parties in 
the transaction. We outline such a dispute resolution protocol as a part of our 
proposed payment system. 

We briefly review some of the existing e-cash based payment systems. For 
brevity, we choose the principal systems [3,6]. Certain other related and inter- 
esting systems are [2,5,7,8,10,11]. 

David Chaum’s Ecash[6] is a fully anonymous, secure online electronic cash 
system. It implements anonymity using blind signature techniques. The Ecash 
system consists of three main entities: 

— Banks who issue coins, validate existing coins and exchange real money for 
Ecash. 

— Buyers who have accounts with a bank, from which they can withdraw and 
deposit Ecash coins. 

— Merchants who can accept Ecash coins in payment for information, or hard 
goods. Merchants can also run a pay-out service where they can pay a client 
Ecash coins. 

To withdraw a coin, the user generates a coin (message), m, consisting of a ran- 
dom serial number, r, multiplied by a blinding factor, b, and the denomination. 
This message, m, is signed by the user using his private key and sent to the 
bank after encrypting the message using the bank’s public key. The bank signs 
the blinded coin and debits the user’s account. The user unblinds the coin by 
dividing by an appropriate blinding factor. Thus, the bank cannot link the Ecash 
to the user. 

While spending, the coins are securely transferred to the merchant. The mer- 
chant verifies the coins by sending them to the bank. After ascertaining that the 
coins are not double spent, the bank credits the merchant’s account and the coin 
is destroyed. If the coin is double spent the bank sends an appropriate message 
to abort the transaction. 
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The advantage of Ecash is that it is fully anonymous and secure as it uses 
public key cryptography. The downside is that the database of spent coins gets 
bigger and new coins have to be issued for every transaction. 

Stefan Brands proposed an offline e-cash payment system [3]. In this scheme, 
three participants are involved : the computer at the bank, computer of an 
Internet service provider and the machine of the user. The user’s machine is 
interfaced with a tamper resistant device. The tamper resistant device increases 
the counter at withdrawal time by the amount that is withdrawn and decreases 
the counter when a payment is made. To make a payment from the user to the 
Internet service provider and for the latter to verify that the payment is genuine, 
a secret key is installed in the device. When a specified amount is transferred 
this key is used to sign the amount. The service provider can now use the bank’s 
public key to verify the authenticity of the electronic money so transferred. The 
user does not know the secret key and hence cannot produce the signature. 
After the digital signature is verified, the service provider accepts and provides 
the requested service to the user. 

The advantage of this system is that transactions do not require the presence 
of a third party for verification. Thus offline operation provides lesser commu- 
nication overheads. However, if the device is broken by anybody, a change of 
device for every user of the system will be necessary. 

Our system, combines the feature of anonymity provided by E-cash with the 
feature of transferability. This obviates the withdrawal and deposit protocols for 
each transaction, as in the case of [6]. For handling disputes during a transac- 
tion, we propose a resolution protocol. In an online system, a single entity is 
responsible for verification of coins. In the system that we propose, this load is 
balanced across several verifying authorities. 

This paper is organized as follows. In Sect. 2 we describe the e-payment setup 
and describe the basic coin exchange, coin-withdrawal, coin deposit and dispute 
resolution protocols. In Sect. 3 we describe the features - anonymity, fairness, 
transferability - of our proposed payment system and how it achieves these. In 
Sect. 4 we discuss some optimizations and extensions to the proposed scheme. 

B The e-Payment System 

In this section, we present our online payment system. Apart from ensuring 
security and anonymity, the system will also incorporate the feature of transfer- 
ability. 

B.l Setup and Notations 

There are three parties involved in the basic coin exchange mechanism: the 
Payer, designated as the Customer (C), the Payee, called a Merchant (M) and a 
Verifying Authority (VA). In a transaction the coins are transferred from C to 
M and the coins are verified by the VA. The VA’s job is two-fold: First, he has 
to verify that the coin has not been spent previously and next, he needs to affix 
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his signature along with requisite information on the coin to allow the merchant 
M to spend the coin later. 

Before describing the protocol, we set up notations for the rest of the paper. 
A signature on a message, MESG, by an entity X will be denoted as S'x(MESG). 
H(.) denotes a strong collision resistant one-way hash function. A symmetric key 
between two entities X and Y will be denoted by Kxy- 

In our system the coin, GOIN, is a bit string consisting of three parts. The 
format of the coin is shown in Fig. 1 below. 



{ Sb(SNO, DENM, EXPD, TS ), SyAo(VAi, TSo, H(SNO, TS)), VAq } 
Fig. 1. Goin Format 



The first part has the fields, serial number (SNO), denomination of the coin 
(DENM), expiry date of the coin (EXPD) and the timestamp of issue (TS). 
This part is signed by the bank using its private key. The second part consists 
of the name of the next verifying authority (VAi), a timestamp (TSq) at which 
verification is done and the hash of serial number (SNO) and timestamp of issue 
(TS) which are fields of the first part. The second part is then signed by the 
present verifying authority (VAq). At issue stage, the present verifying authority 
is the bank B itself. Also, the timestamp (TSq) at issue stage is the timestamp 
of issue, TS. The third part of the coin is the name of the present verifying 
authority (VAq). The first part of the coin remains unchanged throughout the 
lifetime of the coin, i. e. till its expiry date. 

B.2 Coin Exchange Protocol 

The basic coin exchange protocol is explained below. Figure 2 shows the mes- 
sages exchanged between G, M and VAj, the current verifying authority of the 
coin. We now detail the steps. 

Step 1: The customer G requests for goods from merchant M. 

Step 2: M sends a signed message containing the transaction number (TID), 
the description (DESG) of the goods, price (PRIGE) and a time stamp (TS). 

Step 3: G sends an encrypted message, signed using the public key of VA^ 
at which the coins are to be verified. This message consists of GOIN and a hash 
of the TID, DESG, PRIGE and TS. Along with the encrypted message, the name 
of VAi is also sent unencrypted so that M knows where he needs to send the 
coin for verification. 

Step 4: M sends to VA^ two items, (i) The encrypted message received from 
G in Step 3 and (ii) A composite message consisting of the name of the next 
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verifying authority, VAi+i, the above hash of the TID, DESC, PRICE and TS 
and a symmetric key Kjvfy- VA^+i is the name of the next verifying authority, 
to which M would like to send the coin for verification, when M spends it. Kmv 
is generated by M for one session of coin exchange protocol. This is used for 
sending the signed coin back to M. This composite message is encrypted with 
the public key of VA^. 

Step 5: VAi decrypts the encrypted message and verifies that the coins have 
not been spent previously after consulting his database and signs the coin. 



Step 1: 



C 



Request( GOODS ) 



M 



Step 2: 






S (TID, DESC, PRICE, TS) 
M 



M 



Step 3: 



C 



E (COIN, PRICE, H(TID, DESC, PRICE, TS)) , PA 
VA i 

i 

► M 



Step 4: 

VA 

i 

Step 5: 

VA 

i 

Step 6: 

C 



(COIN, PRICE, H(TID, DESC, PRICE, TS)) 
i 





E (VA , PRICE, H(TID, DESC, PRICE, TS), K ) 

VA i+I MV 

i 






Verifies COIN sends OK /REJECT 






Sends signed COIN on OK encrypted with K 







Receipt for GOODS or REJECT 





Fig. 2. Basic Coin Exchange Protocol 



M 



M 



M 



The verification of the COIN is done as follows: Firstly, VA^ verifies from 
the second part that he is indeed the current valid verifying authority. Next, 
VAi checks if the SNO appearing on the COIN is listed in his database. If not, 
then the COIN is authentic and proceeds to sign the COIN. If the COIN is 
listed in his database, he checks the timestamp appearing on the second part 
of the COIN. If this timestamp is greater than the time stamp in the database 
corresponding to the SNO of the COIN, then the COIN is authentic. If both of 
the above conditions are violated then the COIN is treated as double spent and 
a REJECT signal is conveyed to M. The entries that need to be made in the 
verifying authority’s database when a COIN is found to be authentic are: (a) 
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The SNO, if it is not already in the database and (b) the timestamp (TS) of 
previous verification which appears in the second part. 

If the COIN is verified as being authentic, the verifying authority, VAj, needs 
to sign it. This signing is done by replacing the second part of the COIN. The 
second part will now contain the new timestamp (TS^) at which the COIN is 
being verified, the next verifying authority’s name (VAi+i) and the hash value 
that existed in the replaced part. Of course, he may also verify that the hash 
value is correct by doing the hash computation. The third part of the COIN is 
replaced by the name of the present verifying authority, VA^. Figure 3 shows the 
coin being signed by VA^. 



Coin being Verified by VA^: 



{Sb(SNO, DENM, EXPD, TS ), Sva,_i(VA,, TSi-i, 



H(SNO, TS)), VA,_i} 



Coin after Signing by VA^: 



{Sb(SNO, DENM, EXPD, TS ), Sv^.(VA,+i, TSi, H(SNO, TS)), VAj 



Fig. 3. Coin verification/signing at VA^ 



After verification and signing, VA^ sends an OK signal and the signed COIN 
back to M encrypting it with the key Kmv sent by the merchant. If the coin is 
not authentic a REJECT signal is sent to M. 

Step 6: M on receipt of an OK signal sends a signed receipt for the goods 
to C. Similarly, on receipt of a REJECT signal M informs C accordingly. 

Thus, if the COIN is not double spent, then it can be used by the current 
owner M for his next transaction. 



B.3 Withdrawal/Deposit Protocols 

These protocols are executed infrequently since the payment mechanism guar- 
antees transferability of coins. This implies that the coin issuing authority need 
not create new coins for every transaction as in e-payment mechanisms which do 
not possess this transferability property. The deposit and withdrawal protocol 
are now described below (Figs. 4 & 5). 

Coin Withdrawal 

Step 1: The user C sends a request for coins, indicating the amount, the change 
required and his account number with the bank/coin issuing authority B. 
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Step 2: B verifies the outstanding balance in the account and if the requested 
amount is available in the account, issues the coins to the customer. The bank 
is provided with a public key of C when the account is opened which is used to 
encrypt the coins while sending them to C. It is possible to include the blind 
signature mechanism to provide for anonymity of C as in [1,4,9]. 



Step 1: 



C 



E (AMT, SPEC, ACCTNO, VA ) 

B 1 

► B 



Step 2: 






E (COINS) 
C 



B 



Fig. 4. Coin Withdrawal Protocol 



Coin Deposit 

Step 1: The customer C, who wants to deposit the coins, packs the coins and 
encrypts them using bank B’s public key. He provides the account number to 
which the deposit is to be made. 

Step 2: The bank B verifies the authenticity of the coins with the appropri- 
ate verifying authority VA^. On confirmation from VA^, the bank credits the 
amount to the account specified by C. 



Step I: 



C 



E^ (COINS, ACCTNO) , VA ^ 



B 



Step 2: 






Verify COINS 



B 



Step 3: 

If OK from verifying authority , B credits account ACCTNO with appropriate amount 



Fig. 5. Coin Deposit Protocol 
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B. 4 Resolution Protocol 

The e-payment mechanism proposed provides guarantees of fairness to both the 
payer and payee. Thus, if coins are transferred from one entity to another in 
a transaction and a dispute arises, neither party loses any money or goods. In 
this subsection we detail the resolve protocol that has to be executed in case of 
disputes. This protocol also handles aborted transactions due to system failures. 

The payee in our system is guaranteed an authentic payment since the ver- 
ifying authority authenticates the coins. When the payee receives an OK signal 
from some VA^ it also receives the properly time-stamped coins which he may use 
later for future transactions. On a REJECT signal from VA^, the same may be 
used by the payee to prove that the transaction was not completed and therefore, 
refuse transfer of goods. 

The second part of the guarantee is what happens if a payee refuses to transfer 
goods even though valid coins have been handed over by the verifying authority. 
In Step 2 of the basic coin exchange protocol, the merchant M returns a signed 
message containing the description of goods (DESC), transaction ID (TID), the 
price (PRICE) and the time stamp (TS) at which the transaction occurs. The 
merchant cannot deny sending the message since it carries his signature. If the 
customer C can prove that the coins he had sent to M were valid coins and were 
authenticated by the verifying authority, then he can lay claim to the goods ne- 
gotiated during the transaction. To this end, he executes the following resolution 
protocol: 

Resolution Protocol 

Step 1: C sends a resolve request with the signed message obtained from mer- 
chant M and a hash of the coins he had used in the transaction to the verifying 
authority VA^ at which the coin was sent. 

Step 2: VAi on receipt of the request checks the signature of M and if found 
valid checks if the coins spent were authenticated by it. If the claim of C is 
found correct, VA^ directs M to transfer the goods. If M does not accept then 
the coins transferred to it are invalidated by sending an appropriate message 
to VAi+i. Also, the coins spent by C are restored to it so that C does not lose 
money in the aborted transaction. 

C Features 

In this section we describe the main guarantees of our proposed system and how 
it achieves them. 

C. l Anonymity 

Anonymity is provided to both payer and payee as long as both interact truth- 
fully. At no point during the transaction does the payee come to know about 
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the payer’s identity. The payer merely transfers the encrypted version of coins 
used during the transaction and the coins themselves do not contain the iden- 
tity of the payer. Thus, neither the payee nor the verifying authority can break 
the anonymity of the payer. In any case, a payee with honest intentions will be 
satisfied if he gets paid for the goods being transferred and has no advantage 
in getting to know the identity of the payer. Similarly, the verifying authority 
is concerned about the validity of the coins being spent and the identity of the 
payer is immaterial for this purpose. 

However, the identity of the payee is known to the payer. Without this knowl- 
edge the payer cannot transfer the coins to the payee. What the payee will be 
interested in is whether he remains anonymous to the verifying authority. This 
is guaranteed by the protocol since the merchant’s identity is never revealed in 
the basic coin exchange protocol. If the merchant tries to abort the transaction 
on receipt of valid coins from the verifying authority then the resolution protocol 
initiated by the customer would reveal the identity of the payee to the verifying 
authority. Thus, the merchant remains anonymous to the verifying authority as 
long as he involves himself in the transaction in a fair manner. 



C.2 Fairness 

Our proposed system guarantees fairness to both the parties, the customer C 
and the merchant M. By fairness we mean that irrespective of the final outcome 
of a transaction, whether completed or aborted, none of the parties involved in a 
transaction suffer a loss in terms of money or goods. The merchant M transfers 
goods to C only on receipt of coins authenticated by the verifying authority. Thus 
the e-payment mechanism guarantees that he does not have to deliver goods if 
proper payment is not made. A resolution protocol initiated by a customer on 
a “double spent” coin will be detected by the verifying authority. Also, the 
merchant himself will hold a REJECT signal signed by the verifying authority 
for the invalid transaction. This is proof enough for the merchant to not deliver 
the goods. Hence, the payment mechanism ensures fairness to the merchant. 

The Customer C is guaranteed fairness by the resolution protocol. If he has 
indeed transferred valid coins, that are not double spent ones, then by initiating 
the resolution protocol the erring merchant M can be made to deliver the goods. 
A merchant will not dishonour a transaction since he will run the danger of being 
blacklisted. This may adversely affect his future business prospects. 



C.3 Transferability 

A major and significant feature of our proposed mechanism is the transferability 
of coins. Transferability of a coin is achieved by time stamping it at every trans- 
action by a verifying authority. Transferability would imply that there would be 
no need for the withdrawal and deposit protocols for every transaction. Also, for 
achieving this feature there is no dependence on a unique trusted third party. 
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The load of verification of the coins is evenly spread across several verifying 
authorities so that congestion does not become a bottleneck. 

D Extensions/Optimizations 

In this section we outline some extensions to the basic coin exchange protocol 
to cover the situation where more than one coin is spent in a transaction. We 
also discuss a hierarchy within the verifying authorities to optimize the coin 
verification stage. 

D.l Payment with Multiple Coins 

When more than one coin is spent in a transaction it is not practical to encrypt 
all the coins using a public key of the verifying authority. Instead, the customer 
C generates a symmetric key, Kcv, and encrypts this key and the hash value 
as described in Step 3 of the basic coin exchange protocol using the public key 
of the verifying authority. The coins are then encrypted using the symmetric 
key. Key, generated. This key could also be used to send the change back to 
C if exact change cannot be tendered by C. Since, the verifying authority does 
not know the identity of the customer the “change” coins will be sent via the 
merchant who will piggyback these along with the signed receipt of the goods. 

D.2 Fungibility 

Fungibility (i.e. the feature of breaking a coin into coins of smaller denomina- 
tions) can be achieved in the following manner. The user is provided with a 
tamper resistant device which will take as input, from the user, the coin that is 
to be split into smaller denominations. The exact change necessary is specified 
by the user. The device has a secret key embedded within it that it uses to sign 
the smaller denominations. The coins that are signed do not have the sanctity 
of a coin signed by the bank. But, the verifying authority can ensure that the 
change produced is from a genuine coin. The change produced by the device 
contains all the information of the coin that is being “changed” and also the 
details of the change itself. It should also contain a nonce so that the same coin 
cannot be used to make a different set of change later. Finally, the change is 
signed using the key installed within the device before returning it to the user. 

D.3 Organization of Verifying Authority 

In practice, most transactions would involve exchange of more than a single coin. 
In this case, for improving the efficiency of the coin verification stage a distribu- 
tion of the work across a hierarchy is necessary. For each denomination of coins a 
different entity verifies coins of that denomination. During peak hours, depend- 
ing on the load at each of these entities the verification could be distributed 
evenly across them. 
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E Conclusions 

We have described an electronic payment system with the transferability prop- 
erty. This eliminates the withdrawal and deposit protocols. Also, new coins need 
not be issued for every transaction as in payment schemes without the transfer- 
ability property. We have removed the unfavourable load that would be placed 
on the authority that verifies coins by distributing the load across several en- 
tities. The computations involved compare favourably with schemes of similar 
nature. Fairness to both merchant and customer is ensured by the resolution 
protocol. Thus, the system ensures that neither money nor goods are lost by 
either party. This holds good for transactions that are completed normally or 
aborted. The proposed system also provides security and anonymity guarantees 
to the participants in a transaction. 
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Abstract. Most electronic cash systems in the literature have been de- 
veloped in the single bank model in which clients and merchants have 
accounts at the same bank. In the real world, electronic cash may be is- 
sued by a large group of banks, monitored by the Central Bank. Thus not 
only client anonymity but also bank anonymity should be considered to 
simulate anonymity of real money. Because anonymity could be in con- 
flict with law enforcement, anonymity of both clients and banks must be 
contollable such that the identity of the bank is concealed but identih- 
able by the Central Bank in case of dispute and the client is anonymous 
but revocable by a trusted third party. In this paper we study electronic 
cash system in the multiple bank model in which clients and merchants 
have their accounts at different banks, especially from the viewpoint of 
anonymity control. Client anonymity control and bank anonymity control 
are achieved by fair blind signatures and group signatures, respectively. 
By merging fair blind signatures and group signatures, we provide both 
client anonymity control and group anonymity control efficiently. 



A Introduction 

Most proposed electronic cash systems in the literature have been developed 
on the assumption that clients and merchants have their accounts at the same 
bank. Several electronic cash requirements such as anonymity, transferability, 
divisibility, anonymity control have been analyzed on this assumption. But in 
the real world there exist several banks issuing real cash and so electronic cash 
system in more complicated model, so called the multiple bank model, should 
be considered to reflect real cash closer. In the paper we develop an electronic 
cash system in the multiple bank model and analyze the system, especially from 
the viewpoint of anonymity control. 

Anonymity in electronic cash systems is considered useful with the argu- 
ment that real cash is also anonymous and that clients of the systems prefer to 
keep their everyday payment activities private. But anonymity could be used for 
blackmailing or money laundering by criminals without revealing their identities. 
To make e-cash systems acceptable to government, anonymity control should be 
provided such that a trusted third party, called a trustee, has an ability to revoke 
the identities of clients in case of unlawful or suspicious transactions. 
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In the single bank model all of the clients have their accounts at the same 
bank and hence anonymity means client anonymity. But in the multiple bank 
model clients and merchants may have their accounts at different banks, so elec- 
tronic cash system should provide both client anonymity and bank anonymity 
like real cash, therefore both client anonymity control and bank anonymity con- 
trol. Without bank anonymity, client anonymity could be restricted since who- 
ever sees cash issued at the certain bank knows that the owner of the cash is 
someone who has the account at the bank. 

Anonymity control of clients: An electronic cash system providing client 
anonymity control is called a fair electronic cash system. It is important from 
an operational point of view that the trustee is off-line, i.e., neither involved in 
transactions nor in the opening of accounts. Two fair off-line payment systems 
were studied in I4I8I independently. The system in uses “coin identifier” and 
is more efficient in the communication and computation overhead than one in [8] . 
The system in |8] embeds the identity of a client in coins and hence searches the 
relatively small account database using the revoked identity, while the system 
in jl] has to search the withdrawal transaction database to trace the owner of 
coins. Recently some efficient fair off-line payment systems |2| were suggested by 
combining techniques in jl] and |B|. 

Anonymity control of banks: For bank anonymity control we use a group 
signature. It allows members of a group to sign messages anonymously on behalf 
of the group. The signed messages are then verified by a group public key. In 
case of dispute, only a designated group manager can reveal the identity of the 
member. Various group signature schemes have been investigated to develop 
the efficient scheme of which the length of signatures and the size of the group 
public key are independent of the size of the group. But only a few schemes m 
satisfy both requirements. Group signature schemes should be coalition resistant. 
In other words no subset of group members and the group manager is able to 
collude and generate valid group signatures that are untraceable or from which 
the trustee revokes the identity of another group member. The scheme in [3] is 
the provable coalition resistant group signature scheme. 

In this paper we develop a generic fair off-line e-cash system in the multiple 
bank model with both client and bank anonymity controls. For client anonymity 
control, the generic fair e-cash system is first constructed. For bank anonymity 
control we modify the blind group signature scheme in m in order to make 
coalition attacks hard. The scheme in m is the blinded version of [5j which 
is known to be vulnerable to a coalition attack [T]. This modified blind group 
signature scheme, the protocol W in this paper, is merged with the generic fair 
off-line e-cash system developed, thus providing both client and bank anonymity 
controls in the multiple bank model efficiently. 



B Signatures of Knowledge 

To build the system proposed in the paper, we use proof systems in which one 
party can convince other parties that he knows certain values without leaking 
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useful information. In this section the well-known signature of knowledge systems 
introduced in | 4I5| are defined. Since they are used for the purpose of both signing 
a message and providing knowledge of a secret, they are called signatures of 
knowledge. 

Definition 1. A signature of the knowledge of representations of yi , ..., with 
respect to the bases gi,...,gy on the message m is denoted as follows : 

h 

SKREP[{ai,...,au) : (j/i = ^ A (?/„, = \\ qTJ/ ^ 

where the indices G refer to the elements ai,...au and the indices 

bij G {1, ...,u} refer to the base elements gi,...,gy. The signature consists of an 
(u -I- 1) tuple (c, si, ..., Su) G {0, 1}* X satisfying the equation 

h 

C= H{m\\yi\\...\\yy,\\gi\\...\\gy\\y'i'^gll^/\\...\\yl]Jgly/). 

i=i 

For example, SKREP[{ai,a2) ■ yi = g^^ ^ V2 = 5“^S'2^](w) is used for 
proving the knowledge of the discrete logarithm of y± to the base 52 and the rep- 
resentation of j/2 to the bases (31,32) without revealing any information about 
(01,02), where the 31 part of this representation is equal to the discrete log- 
arithm of 3i. The SKREP can be computed only if the secrets (01,02) are 
known. 

The following two signatures of knowledge are based on the double discrete 
logarithm and the root of the discrete logarithm problem, respectively. 

Definition 2. Let Z < fc be a security parameter. An (^-1-1) tuple (c, si, ..., si) G 
{0, 1}^ X Z* satisfying the equation 



c = H{m\\y\\g\\a\\ti\\...\\ti), where U = 






=.) 



if c[i] = 0 
otherwise 



is a signature of the knowledge of a double discrete logarithm of y to the bases 
3 and a, and is denoted SKLOGLOG[a : y = g^°‘°‘'>]{m) . 



SKLOGLOG[a : y = 3^“°^](m) is used for proving the knowledge of the 
double discrete logarithm of y without revealing any information about a. This 
SKLOGLOG can be computed only if the secret a is known. 



Definition 3. Let Z < fc be a security parameter. An (^-1-1) tuple (c, si, ..., si) G 
{0,1}^ X Z** satisfying the equation 



c= iL(m||3||3||e||ti||...||ti), where U = I u-) 









if c[i] = 0 
otherwise 



is a signature of the knowledge of an e— th root of the discrete logarithm of y to 
the bases 3, and is denoted SKROOTLOG[a : y = 3“'](m). 
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SKROOTLOG[a : y = g°"^]{m) is used for proving the knowledge of an e-th 
root of the discrete logarithm of y without revealing any information about a. 
This S K ROOT LOG can be computed only if the secret a is known. 

C A Generic Fair Off-Line Electronic Cash System in the 
Multiple Bank Model 

In this section we describe a generic fair off-line e-cash system using a restrictive 
blind group signature scheme as a building block. 

A group signature provides signer’s anonymity control since a group manager, 
in case of later dispute, can reveal the identity of the signer. A blind group 
signature scheme [lO] adds a blindness property to group signatures such that if 
the signer later sees a message he has signed, he will not be able to determine 
for whom he signed it. So the blind group signature scheme is for both signer’s 
anonymity control and receiver’s anonymity. A restrictive blind group signature 
scheme adds a restrictive blindness property to blind group signatures. The 
restrictive blind signature protocol is defined by Brands in as follows. It is 
implemented using Chaum-pedersen’s blind signature scheme [^. 

Definition 4. Let m G G (in general, it can be a vector of elements) be the 
blinded value of m such that the receiver at the start of a blind signature pro- 
tocol knows a representation (oi, ...,Ofc) of m with respect to a generator-tuple 
(<71, ...<7fc). The signer verifies that the internal structure of the blinded message 
rh. Let (6i, ..., hk) be the representation the receiver knows of the number m after 
the protocol has finished. If there exist two functions I\ and I2 such that 

/i(oi, ..., Ofc) = l 2 {bi, ...,bk), 

regardless of m and the blinding transformations applied by the receiver, then 
the protocol is called a restrictive blind signature protocol. The functions R 
and I2 are called blinding-invariant functions of the protocol with respect to 

(51, 

We construct the restrictive blind group signature scheme, the protocol W, in 
the following section. It is the modification of the blind group signature scheme 
in m- In our scheme, the restrictive blindness property in the restrictive blind 
group signature enables that the correct identity of a client is embedded in coins, 
which is used in owner tracing. 

C.l System Setup and Making Subprotocol 

We assume for the sake of simplicity that there is only one coin denomination, 
one group manager and one trustee. The extension to multiple denominations, 
several group managers and several trustees is easy. 

The group manager chooses a cyclic group G of order |G|, generators g, gi, 52 
and gs such that computing discrete logarithm is infeasible, and then publishes 
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(G, g, gi,g2, ffa)- The bank chooses a secret key and gets a membership certificate 
from the group manager. The trustee chooses its secret key xt S and 

computes yx = g^^ and hr = g^^ , and publishes yx and hx- 

To open an account at the bank, a client first proves his identity by means 
of, say, a passport. The client generates a random number u\ and computes 
/ = gi^ , then sends I to the bank. The bank stores the identifying information 
of the client and I in the account database. I is served as the account number 
of the client. 

In the construction of the system, a group signature scheme is converted into 
a restrictive blind group signature scheme. To control anonymity, the order of a 
cyclic group G of the base group signature should be known. The scheme in 
publishes the group order while the schemes in 0 do not. Although it seems to 
be possible to convert group signature schemes in into blind group signature 
schemes, it seems to be impossible to convert those schemes to restrictive blind 
group signature schemes. 



C.2 Procedure to Construct the Generic Fair Cash System 



Client Bank 

k ZjQj , a 

D,=glD2=I-y!h 

mo = I>2 • 53 
rrio = mo = {D 2 ■ g^T 

U = proof of mo structure , verify U 

mo mp 

Restrictive Blind Group Signature Scheme 

5's(mo) 

Coin : { Di, S's(mo), Vb, Vb }, 
where S's(mo) = Sb(D 2 ■ 53), 

Vu = proof of unblinding of the commitment , 

Ve = proof of the correct Elgamal encryption of / 

Fig. 1. The generic fair scheme in distributed banking 



Let 5i and 52 be generators and yx be the trustee’s public key. The Elgamal 
encryption of the client identity /(= 5“^) with the trustee’s public key consists 
of {Hi, £>2} where 



~ 92 j — lyx- 
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The trustee can recover the client identity from {D\, D2} with his secret key xt 
as follows : 



= 92'"'"^ 192""^ = I- 

In our generic fair cash scheme, the coin consists of three parts: 



Coin = {Di,Sb{D2 ■ 93), V}. 

Di and D2 are the Elgamal encryption of the client identity. Sb{D2 ■ 93) is 
generated through the restrictive blind signature scheme. V is the proof which 
concatenates the proof Vu of unblinding of the commitment and the proof Ve of 
the correct Elgamal encryption of I. Now we describe the procedures to construct 
the generic fair e-cash system. The generic fair off-line e-cash system is shown 
in Figure 1 . 

1 . Commitment construction : (I?2'53)“ which is the blinded value of 02-93 with 
the random value a and is sent to the bank or constructed using committed 
values by the bank. Another commitment to be used in coin tracing is also 
sent to the bank. 

2 . Proof of the commitment structure : The blind group signature scheme is 
converted into the restrictive blind group signature scheme by making the 
bank check the structure of the commitments. The proof is generated to 
show that {D2 ■ 53)“ is really the blinded value of {02-93 = I - 9^ - 93)- 
Another proof which makes sure that the trustee can trace the coin from the 
withdrawal instance is also sent to the bank. Two proofs are merged in our 
scheme. 

3 . Signature generation through the restrictive blind group signature scheme: 
When the client withdraws coins, the bank generates signatures through the 
restrictive blind group signature scheme. So the client is unable to blind the 
internal structure of the commitment. 

4 . Proof of unblinding of the commitment : To make the trustee be able to 

recover the client identity from the coin, the client must unblind the com- 
mitment {O2 •33)“ into O2 - 93- The proof of the unblinding is satisfied by 
showing that the client knows the representation of with respect to 

{giiVT)- In our scheme V is the proof of the unblinding. 

5 . Proof of the correct Elgamal encryption of I : The Elgamal encryption con- 
sists of Di and I?2- To make the trustee recover the client identity from the 
coin, the exponent of 92 in the representation of 0\ must be equal to the 
exponent of yn in the representation of O2 -93. In our scheme the proof V 
also acts as the proof of the correct Elgamal encryption. 

The generic fair e-cash system provides three functionalities in the single bank 
model, i.e., owner tracing, coin tracing and recovering the identity of the double 
spender. To make owner tracing possible, we insert into the coin the Elgamal 
encryption of the client identity with the trustee’s public key. The trustee can 
recover the client identity from the coin by decrypting the Elgamal encryption. 
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To make coin tracing possible, the bank must look into the structure of the 
commitment in the withdrawal protocol. The trustee can trace the coin from the 
commitments. To make the merchant recover the client identity when double 
spending happens, the client is required to prove the knowledge of random values 
which are used in the coin by responding the challenge from the merchant. If 
double spending happens, the bank can know the random values and calculate 
the client identity. 

In the multiple bank model another functionality, i.e. signer tracing, is pro- 
vided by the blind group signature scheme which is used as subprotocol in the 
generic fair cash system. 



D The Implementation of the Generic Fair Electronic 
Cash System in the Multiple Bank Model 

In this section, we implement a fair off-line e-cash system where a group of banks 
distribute electronic cash. First the restrictive blind group signature scheme, the 
protocol W, is constructed. Using the protocol W as subprotocol, we implement 
a generic fair off-line e-cash system in the multiple bank model. 

D.l System Setup 

To set up the group signature scheme, the group manager chooses a security 
parameter I and computes the following values: 

1. An RSA public key (n, e), secret key d such that ed = l{mod n), where the 
length of n is at least 21 bits. 

2. A cyclic group G of order n, generators 5 , 51,52 and 53 such that computing 
discrete logarithm is infeasible. In particular, we can choose G to be a cyclic 
subgroup of Zp where P is a prime and n\{P — 1). 

3. An element a G Z* where a has large multiplicative order modulo all the 
prime factors of n. 

4. An element t where the logarithm of t to the base a is unknown to group 
members. 

5. An upper bound A on the length of the secret keys and a constant /i > 1. 

The group’s public key is (n, e, G, 5 , 51 , 52, 53, a. A, /r, t). 

When a bank wants to join the group, it picks a secret key x S {0, 1, ..., 2'*' — 
1} and calculates y = a^(mod n) and the membership key z = g^. The bank 
commits to y and sends ( 5 , z) to the group manager and proves that it knows 
X (without actually revealing x) using techniques similar to the signature of 
knowledge of discrete logarithm [3] . If the group manager is convinced that the 
bank knows x, the group manager gives the bank a membership certificate v = 
{y+t)‘^{mod n). (Note that this certificate structure is different from the original 
certificate structure in [5] to prevent a coalition attack. This form is appeared in 
jl].) It is an additional security assumption that computing v without factoring 
n is hard. 
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A client has his account I at the bank. To make his account, he selects a 
random number u\ and computes his account I = . 

The trustee chooses its secret key xt C such that yr = and hr = 

■ The trustee publishes yr and hx- 



D.2 The Protocol W 



Player A 



pLL ^ 

pRL ^ (^pRL^^ 

Q LL f)LL 

I — 

j^RL r)RL 

Vi — -^o-(i) 

2 ^+/^ ^ ^ 
~bf^£Z* 



■2/2/ 

LL{a°i ) 



= Hi{r 



= Q. 

pRL ^ qRL(,R^ 

ghWRt 



= Hi{m,\\zg^\\g\\e\\Rf^\\... 



\R?^) 



z,Pl^^,Pf 



'll ' RL 



verify with c 
verify tf ^ with c 



fRL t 2 i 

^crfil ^i 



{m, g,z, Vi,V2) 



Player B 
{x,y,v,g) 



< ^LL ^ 2^+1^ - 1 

e Z* 

j,LL 

pLL = g(« ‘ ) 

pRL ^ zirp^r 



= if c'[i]“ =0. 
— x{mod n) otherwise, 
tf ^ = rf-^ if c' [i]“ = 0, 
rf'^ /v{mod n) otherwise. 



Fig. 2. The protocol W 



The protocol W is a restrictive blind group signature scheme which is a 
modification of the blind group signature scheme in m- To make the protocol 
W a restrictive blind group signature scheme, the commitment g is generated by 
Player A while it is generated by Player B in the blind group signature scheme 
in m- The modified version of the underlying group signature scheme in m is 
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also used in the protocol W in order to make coalition attack hard. The protocol 
W is shown in Figure 2. 

Player A gets player B’s blind group signature which consists of (m, g, z, Vi , V 2 ) 
where Vi and V 2 are the blind signatures of knowledge. The blind signature of 
knowledge 



Vi = (c^^, for l<i<l) = SKLOGLOG[x : z = 

assures that the group manager can determine the identity of the signer. The 
blind signature of knowledge 

P 2 = (c^^, for l<i<l) = SKROOTLOG[v : zg* = 

proves that the signer is one of the group members. 

In this protocol w is used in blinding g and z. a(a permutation) is used in 
blinding and < i < 1) and < i < 1) are used to blind 

sf^(l < i < 1) and sf^(l < i < 1), respectively. So a and < i < 1) 

are blinding the proof Vi = for I < i < 1) = SKLOGLOG[x : z = 

and a and bf^^il < i < 1) are blinding the proof Vo = (c^^, for 1 < 
i<l) = SKROOTLOG[v : zg* = 



D.3 The Withdrawal Protocol 



Client 

id, 91 , 92 , 93 , hr, 




Bank 

{g,x,y,v,gi,g2,g3,hT,yT,t) 


k, h Zn, Oi (zj? Zj,^ 

U = SKREP[{a, k):Mi= 7“ 
AM 2 =yfAMi= gf 
AM 4 = Mf A Ms = /It] 


Mi,M2,M3,M4,M5,J7 


verify U 


make mo = Mi • M 4 • M 3 




make mo = Mi • M 4 • M 3 


= • (//t)“ • 53 = {IVT93r 




= • (5t)“ • 53 “ = {IVT93r 


{g2,rho,a~^) 




{x,y,v,rho) 




The protocol W 





(p^,mo,mg,Fi,F2) 



Fig. 3. The withdrawal protocol in distributed banking 



When the client wants to withdraw a coin, he first must prove ownership 
of his account by any means. Then the withdrawal protocol is performed. We 
use the protocol W as subprotocol of the withdrawal protocol. The withdrawal 
protocol is shown in Figure 3. The client first constructs commitment {D 2 ■ gs)°‘ 
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and generates the signature U of the commitment structure as described in 
Section 4. The signature of knowledge 

U = {c,Si,S2) 

= SKREP[{a, fc) : Ml = /“ A M2 = A M3 = A M4 = m!^ A Mg = 

demonstrates two facts to the bank. First, {D2 ■ gz)°‘ is really the blinded value 
of {D2 • gs = I • Ut ' 93)- Second, the exponent k in is equal to the exponent 
k in D2 = I ■ y^, which allows the trustee trace the coin. 

The client commits (I?2 ' smd the bank uses the restrictive blind group 
signature protocol to generate the signature of D2 • gs- By the characteristic of 
restrictiveness, the client is unable to blind the internal structure of the commit- 
ment. This assures that the structure of commitment {D2 • 33)“ and the structure 
of D2 ■ gs is identical. That is, 

Ii{a • ui,a • k,a • 1) = hiui, k,l) = u\ : fc : 1 

where I\ and I2 are blinding invariant functions of the protocol with respect to 
(51,92,53)- 

To allow the trustee to recover the client’s identity I from the coin, the client 
unblinds the commitment {D2 • gs)°‘ into D2 • 53 and embeds it into the coin. 
This unblinding is proved by V : 

Ui k 

V = {c,si,S2) = SKREP[{k, ui) : Di = g^ A D2 = — = 

53 53 

The signature of knowledge V shows that the client knows the representation of 
with respect to (gi, j/t) and hence assures that the client correctly unblinds 
the commitment. Simultaneously, V proves that exponent k in D\ is equal to 
exponent k in D2 ■ gs- Note that D2 can be extracted from D2 ■ gs and D\ is 
made by the client himself. Since the Elgamal encryption of the client’s identity 
I consists of Di and D2, V eventually proves that I is revocable from the coin 
by the trustee. 

D.4 The Payment Protocol 

With the coin and random numbers k, b which was generated in the withdrawal 
protocol, the client initiates the payment protocol. The client gets IDMerchant, 
and Date and calculates (cp, Sp) using the hash function P[: 

Op = El (^I D^Q^chant\\Date^ 

Sp = b — kcp (mod n) . 

The payment protocol is shown in Figure 4. The client sends {Coin, Date, (cp, Sp)} 
to the merchant. The merchant then checks if Date is within allowed time in- 
terval and verifies (Vi,V2) and V. The merchant also verifies that equation 
52*’ ■ {92Y’’ = 52 is satisfied. If all verifications are successful, the merchant 
checks if the paid coin is double-spent by looking into its coins. When the mer- 
chant deposits coins, the bank also checks if the deposited coins are double-spent 
by looking into its coins. 
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Client Merchant 

{Coin, k, b, IPMerchant, Date) (n, e, G, g, gi , g2, g3, a, A, fi, t) 

Cp = H {I DMerchantWDntc') 

Sp = b — kCp{mod n) 

Ooin,Date,{cp ,Sp) 

verify (V 4 , V2), R, Date 

verifi^ gl” ■ {g^Y” = 

{Coin, {cp, Sp)} 



Fig. 4. The Payment protocol 



D.5 Anonymity Revocation 

In this implementation, the coin is as follows : 



Coin= {Di,Sb{D 2 ■ gs),V} 
where Sb{D2 -93) = {92, D2 ■ 93, {^2 • ^2}- 

Double Spending Detection When the client double spends the coin to the 
same merchant, two Dates are different, and therefore two (cp, Sp)s are different. 
Then the merchant calculates the client’s identity, I, from (ci,si) and (02,52) 
by following calculation : 



5l - 52 
C2 - Cl 



In case of double spending of the same coin to different merchants, two 
IDMerchantS are different and hence two {cp, Sp) are different. After two mer- 
chants deposit each coin, the bank knows that the double spending has occurred. 
Then the bank calculates the client’s identity, I, by the calculation above. 
Owner Tracing The trustee can recover the client identity from the coin by 
computing 

D2 ■ = I. 

Coin Tracing The trustee can trace the coin from the record of the withdrawal 
protocol which is kept in the bank by computing 

Mr = ( 4 )"" = ( 4 "'")"" = 92 = Di. 

Signer Tracing Given a signature {g, z,Vi,V 2 ) for a message m, the group 
manager can determine the signer by testing if g^^ = z for every group member 
P (where yp = logg zp and zp is P’s membership key). Thus the running time 
of this scheme is linear in the size of the group. 
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D.6 Efficiency and Security 

In our generic e-cash system, owner tracing is bank account based as in [Sj . Thus 
from the viewpoint of coin tracing and owner tracing our scheme is as efficient as 
the one in |0], which is one of the most efficient schemes to date. With regard to 
signer tracing our scheme is as efficient as the one in m in opening the signature 
to recover the signer’s identity. 

The size of the fair group signature depends on the size of the underlying 
blind group signature, which is the size of Sb{D 2 ■ gs) plus the size of Di and V. 

The security of our fair off-line e-cash system depends on the security of the 
underlying blind group signature schemes and the signatures of knowledge. The 
security of the scheme is based on the assumptions that the discrete logarithm, 
double discrete logarithm, and the roots of discrete logarithm problems are hard. 
In addition, it is based on the security of Schnorr and the RSA signature schemes 
and on the assumption that computing membership certificates is hard. The 
group signature scheme in is not coalition resistant, but can be easily fixed. 
So in our implemented scheme we use fixed version of the group signature scheme 
in [^, which is mentioned in [l]. The fixed group signature scheme seems to be 
coalition resistant, but is not provably coalition resistant. 



E Conclusion 

In this paper we have proposed efficient and secure fair off-line electronic cash 
a group of banks can distribute. Distributed electronic banking reflects the real 
world closer than the single bank model and was considered by Lysyanskaya 
and Ramzan in 1101 . They suggested that banks form a group and clients form 
another group to provide both client and bank anonymity control through blind 
group signatures. In our approach, first a generic fair off-line e-cash system is 
developed to provide client anonymity control and then the blind group signature 
scheme is applied to a group of banks to provide bank anonymity control. Our 
scheme provides coin tracing which the scheme suggested in m does not provide. 
Furthermore, when double spending is occurred in the scheme in m, merchants 
or banks have to ask the trustee to revoke the identity of the double spender 
from the double spent coin. In our scheme merchants or banks can revoke the 
identity of the double spender from the instances of payment with the same coin. 
Our scheme is more efficient since the clients need not to be involved in extra 
group signature protocols, and hence the size of cash in our scheme is half of 
that in the scheme suggested in m- 
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Abstract. This paper significantly improves the message complexity 
of perfect asynchronous secure computations among n players toler- 
ating a computationally unbounded active adversary that corrupts up 
to t < j players. The protocol presented in this paper communicates 
Cl(mn^ Ig |IF| -|-mn^lgn) bits and broadcasts 0{mn^) bits, where m is 
the number of multiplication gates in the circuit. This is to be compared 
with the most efficient perfect secure asynchronous protocol known so 
far, namely the protocol of [5], which requires 0{mn^ lg\W\ + mn^lgn) 
bits of communication apart from 0{mn'^ lg»^) bits of broadcast. 



A Introduction 

Consider the scenario where there are n players (or processors) V = {Pi, P2, • ■ • 
Pn}, each Pi with a local input Xi. These players neither trust each other nor 
trust the channels by which they communicate. Nevertheless, they wish to cor- 
rectly compute a function P{xi, . . . , x„) of their local inputs, while keeping their 
local data as private as possible. This is the problem of secure multiparty com- 
putation. The problem of secure multiparty computation has been extensively 
studied in several models of computation. The communication facilities assumed 
in the underlying network differ with respect to whether secure communication 
channels are available or not available [ 116110 ] . whether or not broadcast 

channels are available mm and whether the communication channels are 
synchronous or asynchronous m- The correctness requirements of the protocol 
differ with respect to whether exponentially small probability of error is allowed 
(unconditional) or not allowed (perfect). The corrupted players are usually mod- 
eled via a central adversary that can corrupt up to t players. The adversary 
may be computationally bounded (computational setting) or unbounded (secure 
channels setting). One also generally distinguishes between actively corrupted 
players (Byzantine), passively corrupted players (Eavesdropping) . 

We consider the problem of perfect asynchronous secure multiparty compu- 
tation over a fully connected network of n players (processors) in which up to 
t Byzantine faults may occur, where every two players are connected via a se- 
cure and reliable communication channel (secure channels setting) . The network 
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is asynchronous, meaning that any message sent on these channels can have 
arbitrary (finite) delay. We model the faulty players’ behavior through a com- 
putationally unbounded adaptive adversary. This adversary may choose, at any 
stage of the protocol, additional players to corrupt, as long as he does not ex- 
ceed the limit of t faulty players. To model the network’s asynchrony, we further 
assume that the adversary can schedule the communication channel, i.e. he can 
determine the time delays of all the messages (however, he can neither read nor 
change those messages). We call such an adversary a t- adversary. 

It was shown in [Q that perfect asynchronous secure multi party computation 
is possible in the above setting if and only iff < In an (|"|^] — I ) -resilient 
protocol is described that securely computes any function T when exponentially 
small probability of error is allowed. Although theoretically impressive, these 
results lack in the area of practical feasibility. The complicated exchanges of 
messages and zero-knowledge proofs in protocols like 1 5171 might render them 
impractical. 

In the synchronous model of communication one has often attempted to re- 
duce the communication complexity of multiparty protocols 1114131141151 . More 
recently, m significantly improved the message complexity of secure synchronous 
unconditional multi party computations through a generic framework that strictly 
distinguishes between fault-detection and fault-correction and uses a technique 
called player elimination to efficiently transform a distributed protocol with fault 
detection into a fully resilient protocol. The problem of reducing the communica- 
tion complexity of secure multiparty computation over an asynchronous network 
was left open in HZ). 

We concentrate on reducing the communication complexity of perfect asyn- 
chronous secure protocols. The notion of communication complexity of secure 
computations in an asynchronous network has not received much attention. The 
only known solution for error-less asynchronous secure computation was pro- 
posed by | 5 ]. Even for this protocol the communication complexity was not fully 
analyzed. We provide a simple analysis of the communication complexity of [S] 
and then show that our protocol has significantly smaller communication com- 
plexity. 



B Towards Efficient Asynchronous Secure Protocols 

Each player Pi holds a private input Xi, and the players wish to securely com- 
pute the exact value of a function ■ ■ ■ , x„). However, since the network is 

asynchronous, and t players may be faulty, the players can never ever wait for 
more than n — t of the inputs to be entered to the computation. Furthermore, 
the missing inputs are not necessarily of the faulty players. Formalizing this con- 
cept is a very delicate issue and we refer the reader to m for complete formal 
definitions of asynchronous secure computations. 

Most distributed resilient protocols in the literature are derived from a private 
protocol processing the circuit in steps where after each step some consistency 
checks are performed. If an inconsistency is detected, then a fault-recovery proce- 
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dure is invoked. These protocols are in general much less communication-efficient 
than their private (but non-resilient) counterparts since the adversary can pro- 
voke fault-recovery after every step even by corrupting only a single player. In 
this section, we describe an efficient resilient protocol with a substantially re- 
duced number of invocations of the fault-recovery procedure. Each time a fault 
is detected, at least one corrupted player is eliminated from the further compu- 
tation m- Hence the number of fault-recovery invocations is bounded by the 
maximal number of corrupted players and is independent of the circuit size. 

Furthermore, fault detection requires that all players agree on whether or not 
a fault occurred. If broadcast is not available as a primitive a Byzantine agree- 
ment (BA) protocol is required^] Since protocols simulating broadcast have sub- 
stantial communication complexity, we need to amortize them over a sequence 
of several steps. For this, we divide the circuit into segments each consisting 
of a sequence of steps. In general, if a fault occurs in a step within a segment 
the players must know this immediately after that step to prevent a violation 
of privacy in subsequent steps. But it is sufficient that a player who detects a 
fault informs all other players (without using broadcast). This is only a partial 
fault detection. Each player informed about a fault finishes the computation 
of the current segment with dummy values. Only at the end of the segment, 
a strict fault detection is performed and if required fault localization, player 
elimination, and fault correction protocols are executed. When a segment fails 
and fault localization yields a (r,p)-localization0 C V (our protocol yields a 
(l,3)-localization) a first idea could be to eliminate these players and re-share all 
the intermediate values of the current computation according to the new setting. 
However, the number of intermediate values is potentially a super-polynomial 
in n, hence such an approach is potentially inefficient. Rather we use a lazy 
re-sharing: whenever a (r,p)-localization takes place, the set V of players is re- 
stricted to V\T>, and t is reduced to t — r but no re-sharing is performed. For each 
gate, immediately before their evaluation, only that gates’ inputs are re-shared, 
if necessary. It is important that this lazy re-sharing preserves the ability for 
continuing the computation. That our implementation of lazy re-sharing has the 
above property will be clear in the sequel. 

We analyze the communication complexity of our protocols. The computation 
complexity is not analyzed because only the communication complexity is the 
bottleneck of such protocols (such protocols evaluate circuits where the number 
of linear gates (requiring no communication) is not very large compared to the 
number of multiplication gates) . The considered protocols make extensive use of 

^ Clearly, broadcast can be implemented by Byzantine Agreement (BA). Nevertheless, 
Bracha [H] describes a simple (rjl ~ l) -resilient Broadcast protocol for a Byzantine 
setting. The termination property of Broadcast is much weaker than the termination 
property of BA: for broadcast, we do not require that the good parties complete 
the protocol in case that the sender is corrupted. We do not distinguish BA from 
Broadcast since in the complexity analysis we use the sub-protocol for simulating 
broadcast channels only as a black-box. 

^ An (r, p)- localization is a set V with \T>\ = p players, guaranteed to contain at least 
r cheaters. 
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a BA primitive and hence their efficiency depends heavily on the communication 
complexity of the applied BA protocol. Therefore, we count the BA message 
complexity (BAG) and the message complexity (MC) separately. 

C The Protocol Construction 

Efficient protocols for perfect asynchronous secure computation can be con- 
structed based on sub-protocols for Input Sharing, Agreement on a Common 
Subset, Linear Combination, Multiplication, Degree Reduction, Segment Fault 
Localization and Output Reconstruction. 

In a nutshell, an agreed function is computed as follows: Let Xi be the input of 
Pi. Let F be a finite field known to all parties with |F| > n, and let iF : F" ^ F 
be the computed function. We assume that the players have an arithmetic circuit 
computing F; the circuit consists of linear combination gates and multiplication 
gates of in-degree 2. All the computations in the sequel are done in F. 

The outline of the protocol follows: First each player secret-shares its input 
among the players using the sub-protocol for Input Sharing. This sub-protocol 
runs in two phases. In the first phase it uses a technique similar to Shamir’s 
secret sharing scheme |19| . extended to a two-dimensional sharing [6I5TI7] . In 
the second phase, the players agree, using the protocol for Agreement on a Com- 
mon Subset, on a core set C of players that have successfully shared their input. 
Once C is computed, the parties proceed to compute the function in the fol- 
lowing way. First, the input values of the players not in C are set to a default 
value, say 0; then the players evaluate the given circuit gate by gate using the 
sub-protocols for Linear Combination and Multiplication and finally the output 
value is reconstructed towards the each of the players using the sub-protocol for 
Output Reconstruction. The sub-protocol for multiplication applies the player- 
elimination technique; either the outcome is a proper sharing of the correct 
product, or a fault is detected. In the latter case, the sub-protocol for Segment 
Fault Localization is applied to determine a localization D. Then the players in 
D are eliminated from the further protocol, and the shared values are re-shared 
for this new setting involving fewer players using the sub-protocol for Degree 
Reduction. 

C.l Agreement on a Common Subset 

In a perfect asynchronous resilient computation, very often the players in V 
need to decide on a subset of players, that satisfy some property, of size at least 
{n — t) > 3< -I- 1, where n = \V\. It is known that all the honest players will 
eventually satisfy this property, but some faulty players may satisfy it as well. 
In our context, we need to agree on the set C of players who have completed 
correctly sharing their input. For implementing this primitive, we adopt directly 
the protocol presented in |7] . The idea is to execute a BA (Byzantine Agreement) 
protocol for each player, to determine whether it will be in the agreed set. Notice 
that if some Pj knows that Pi satisfies the required property, then eventually 
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all the players will know the same. Thus, we can suppose the existence of a 
predicate Q that assigns a binary value to each player Pi, denoted Q{i), based 
on whether Pi has satisfied the property as yet. The protocol is denoted by 
AgreeSet[Q,V,t]. For a detailed description of the protocol, see [?]■ 

Theorem 1 (HD- Using the protocol AgreeSet[Q,V ,t] the players indeed agree 
on a common subset of players, denoted by C such that \C\ > {\V\ — t). Moreover, 
for every Pj in C, we have Q{j) = 1. The AgreeSet[Q,V ,f\ protocol has a 
Byzantine Agreement Complexiy ofO{\V\). 

C.2 Input Sharing and Output Reconstruction 

The protocol for Input Sharing has two phases: first, each party shares its input 
using an Asynchronous Verifiable Secret Sharing (AVSS)E1 scheme; next, the 
parties use the AgreeSet protocol to agree on a set C of at least {\P\—t) parties 
who have shared their inputs properly. Our implementation of the protocols 
InputShare and OutputReconstruction are quite similar to the ones existing in 
the literature. In our protocol, each player Pi uses a variation of the V-Share 
protocol of 13, denoted AV- Share, to share its input in the first phase of the 
Input Sharing protocol. The second phase is implemented using the AgreeSet 
protocol. The protocols AV-Share, InputShare and OutputReconstruction are a 
formally specified in Fig. [TJ 

Implementation To implement the input sharing, each player Pi uses a varia- 
tion of the V-Share protocol of [Sj, denoted AV-Share, to share its input in the 
first phase of the Input Sharing protocol. The AV-Share protocol differs from 
the V-Share protocol in the following ways: 

— In AV-Share protocol, the dealer (the player with the secret that is to be shared 

among all players) chooses a polynomial p{x,y) = degree t in 

two variables, with r^3 = rjS such that p(0, 0) = s and sends the polynomial 
fi{x) = p{x,ai) to player Pi{foT i = 1, 2, . . . 1^1), unlike the V-Share protocol 
where rij need not be equal to rji and hence the dealer needs to send both the 
polynomials fi{x) = p{x, at) and gi{x) = p{ai,x) to player Pi(for i = 1, 2, . . . IPI). 

— In the V-Share protocol, the share of player Pi is Si = /i(0), and the second di- 
mension of sharing is not used. In the AV-Share protocol, each Pi (locally) outputs 
fi{0), which is Pi’s share of s, and also the share-shares Sji = fi{aj). 

The second phase is implemented using the AgreeSet protocol. The InputShare 
protocol and the OutputReconstructioi^ protocol are given in Fig[TJ 

Theorem 2. The pair (AV-Share, OutputReconstruction) is a t-resilient AVSS 
scheme in a network of \V\ parties, provided that \V\ > 4t -|- 1. 

® For a formal specification of an AVSS scheme, see [3 
This helps one to achieve an efficiency gain of a factor 2. One can prove that privacy 
is not violated by this technique. See [12j for details. 

® Note that this protocol needs neither error correction nor broadcast. 
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Protocol InputShare["P, t] 

Code for party Pi with secret Si 

1. Initiate AV-Sharei[P,t, Pi, Si], For 1 < j < \P\, participate in AV-Sharej . Let 
fi(x),sj.i, 1 < fc < IPI be the output AV-Sharej. 

2. Execute protocol AgreeSet[Q ,V ,t] with the boolean predicate : Q{j) = 1 if Pj 
has completed AV-Sharcj successfully. Let CommonSet be the output of the 
protocol. 

3. Output {CommonSet, {f^ (x), 1 < k < \P\,j € CommonSet}) 

Protocol OutputReconstruction['P, t, P] 

1. Each party Pi sends the polynomial fi{x) and the share-shares Sji to party P. 

2. Party P reconstructs the output as follows: 

Eor 0 < k < t do: 

Wait until (|P| — 2t -|- fc) parties’ messages are received. 

Check if at least (|P| — 2t) fi{x)’s are consistent with all but t share-shares. 

If yes, interpolate and output s from Si = fi{0) of consistent polynomials. 

If not, go to the next iteration. 



Fig. 1. Input Sharing Protocol and Output Reconstruction Protocol 



Proof. We assert the Termination, Correctness and Secrecy requirements of the 
above scheme. As per the definition, if the dealer is honest, an AVSS scheme has 
to terminate with certainty for all uncorrupted players. In case of a corrupted 
dealer no requirements are posed on the termination. The protocol AV-Share 
terminates with probability 1 for an honest dealer (this follows directly from the 
proof of the V-Share protocol given in E). The protocol OutputReconstruct 
terminates with certainty since P will eventually receive at least {\V\ — t) mes- 
sages. The correctness of the AV-Share protocol follows from the results of 
where they prove that the shares and share-shares of the honest players indeed 
define an unique polynomial of degree t in two variables. The correctness of the 
protocol OutputReconstruct can be proven as follows: Assume that a player 
hands a bad polynomial /j ^ fi- Of the {\V\—t) messages received, this polyno- 
mial is inconsistent with the share-shares of at least {\V\ — 2t) players. At least 
(|iP| — 3t) players gave their correct share-shares to P. Player P will detect at 
least {\V\ — 3t) > t inconsistencies and ignore this polynomial. On the other, 
if fi is the correct polynomial, at most t share-shares(of the corrupted players) 
will be inconsistent. Hence P interpolates only correct shares and computes the 
correct secret s. The secrecy of protocol AV-Share follows from the fact that the 
degree of the chosen polynomial is always greater than the number of corrupted 
players. The privacy of the OutputReconstruct protocol is obvious as no player 
but P receives any information. □ 
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C.3 Linear Combination and Multiplication 

Let V be the set of players, at most t of which are corrupted, and let £ be a linear 
function. Futhermore, assume that the values a, 6,. . . are d-sharecQ with polyno- 
mials p{x, y), q{x, y), ■ ■ ■ respectively, where t < d < (n — 3t) and n = \V\. Note 
that each player Pi has fi{x) = p{x,ai),aji = p{ai,aj),gi{x) = q{x,ai),bji = 
q{ai,aj), . . . (the Oi’s are public random field elements known to all players). 
Due to linearity of £, in order to compute £(a, 6, ...), each player Pi locally 
computes hi = C{fi,gi, . ■ ■) cp = C{ap,bji, . . .), for j = 1, . . . , \V\. 

A t-resilient protocol for multiplication starts with t-sharings of a, b and 
ends with a t-sharing of c = a.b (if there are no inconsistencies) or with a 
partial fault detection (if there exist inconsistencies). The t-adversary view of 
the computation is distributed independent of the initial t-sharings for any t- 
adversary. Our implementation of the multiplication protocol is given in Fig. [3 

C.4 Degree Reduction 

Let V be the set of players of which at most t are corrupted, and assume that 
a value s is d-shared among the players, where t < d < {n — 2t) and n = \V\. 
Assume that s is d-shared with the polynomial p{x, y) with player Pi holding 
fi{x) = p{x,ai),Sji = p{ai,aj),l < j < n. The goal of Degree Reduction is 
to transform this sharing into a proper and independent t-sharing of s. A t- 
resilient protocol for Degree Reduction for \V\ players starts with a d-sharing of 
s where t < d < (|7^| — 2t) and ends with either a t-sharing of s (if there are 
no inconsistencies) or with a partial fault detection (if there exist inconsisten- 
cies). Also, the t-adversary view of the protocol is distributed independent of 
the initial d-sharings for any t-adversary. We implement the Degree Reduction 
protocol as follows: first, every player Pi t-shares Si = /i(0) using the protocol 
Efficient AVShare{see Fig.[2| and proves that the value shared is in fact Si using 
the protocol ACheckShare(see Fig. |2j. Then, every player locally computes the 
linear combination according to Lagrangian interpolation which results in a t- 
sharing of s ng. The formal description of the implementation of the protocol 
Degree Reduction is given in Fig. |2] 

Theorem 3. The protocol DegreeReduction is a t-resilient protocol for the de- 
gree reduction functionality. 

Proof. The protocol Efficient AVShare terminates because there are at most t 
corrupted players and hence at least {\V — <|) players will distribute consistent 
shares and so the set G is well defined. On similar lines, we can see that the pro- 
tocol ACheckShare terminates as well. The correctness of the DegreeReduction 
protocol follows from the following. If the protocol Efficient AVShare succeeds, 
then it implies that there exists a set of players G of size at least {\V\ — t) such 

® A value s is d-shared among the players if there exists a polynomial p{x, y) in two 
variables of degree d such that p(0, 0) = s. Each party Pi has Si{x) = p(x, af), Sji = 
p{ai,aj),l <j<n. 
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Protocol EfflcientAVShare[P, P, s] 

The dealer chooses a random polynomial h(x,y) = degree t in two 

variables, with Vij = rji such that h(0,0) = s and sends the polynomial fi{x) = 
h{x, Oi) to player Pi(for i = 1,2, . . . \V\). 

Code for party Pi : 

1. Upon receiving fi{x), for each 1 < j < \P\, send fi{aj) to party Pj. 

2. Upon receiving (|P| — t) messages rriji check if rriji = fi{aj). If equality holds 
for all rriji, send a CheckMessaga ~ “OK" , else, send CheckM essagci := j, 
where j denotes the smallest index such that the value received from Pj was not 
equal to fi{oij), to all parties. 

3. Execute protocol AgreeSet[Q,V ,t] with the boolean predicate: Q{j) = 1 if 
CheckMessagcj has been received. Let G be the output of this protocol. If 
CheckMessagcj = “OK" V j such that Pj £ G, then the protocol succeeds 
with output {fi{x), fi{aj), 1 < j < |P|). Else, set Fault Detectedi ~ TRUE. 

Protocol ACheckShare[P, d, t, P, s] 

1. The dealer sets g{x) := distributes shares on g{x) which is of 

degree d — 1 to every player Pi using the protocol Efficient AVShare. If this fails, 
then the whole verification protocol fails. 

2. Every player Pi checks that /(oi) + ai.g{ai) = h{ai). If consistent, the player 
sends CheckBiti := 1, else he sends GheckBiti ~ 0, to all players. 

3. Every player executes protocol AgreeSet[Q ,V ,t] with Q{j) = 1 if GheckBitj has 
been received. Let G be the output of this protocol. If GheckBitj = 1 for all j \ 
Pj G G, then the verification was successful. Else, set FaultDetectedi ~ TRUE. 

Protocol DegreeReduction[P, d, t, s] 

Code for party Pi: 

1. Initiate EfficientAVSharei[P ,t. Pi, Si\. Eor 1 < j < |P|, participate in 

Efficient AV Share j . If EaultDetectedi = EALSE then let (fi (x), si,^, 1 < k < 
|P|) be the output of E f ficientAV Sharej . Else terminate with output ‘NULL’. 

2. Run AGheckSharei\P ,d,t,sf\. For 1 < j < |P|, participate in AGheckSharej. If 
EaultDetectedi = TRUE then terminate with output ‘NULL’. 

3. Execute protocol AgreeSet[Q ,V ,t] with the boolean predicate: Q{j) = 1 if Pj 
has completed EjficientAV Share j successfully to get the output say CommonSet. 

4. Locally compute the linear combination of the shares and share-shares received 
from players in GommonSet, as in HD. 

MUL[P,t,a,fo] 

Each player Pi does the following for j = 1, . . . , n: 

— Computes d{x) = fi{x).gi{x). 

— Evaluates Cji = Uji.bji. 

— Calls DegreeReduction[P ,2t,t, c] which starts with a 2t-sharing of c. 

— If the above DegreeReduction call outputs ‘NULL’ then terminates with output 
‘FAIL’, else ends with the t-sharing of c. 



Fig. 2. Input Sharing with Fault Detection, Degree Reduction and Multiplication 
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that each player Pi G G is consistent with the share-shares of at least {\V\ — t) 
players of which at least {\V\ — 2t) are honest. Since (|7^| — 2t) > 2t -|- 1, the 
players in G have verifiable shares and the (|7^| — 2t) honest players in G define a 
t-sharing of s. Also, if the sharing phase of the protocol ACheckShare succeeds, 
then indeed the shares of all players lie on a polynomial of degree d — 1, and the 
degree of the polynomial f{x) + x.g{x) is at most d. Hence, if there exists a set of 
players G of size at least (|7^| — t) such that each honest player Pi G G is consis- 
tent with f{ai) + ai.g{ai) = h{ai), then the polynomial f{x) +x.g{x) and h{x) 
have at least {\V\ — 2t) > d points in common, which implies that /(O) = h{0) 
(i.e. same secrets). The secrecy of this protocol is due to the independence of 
the the sharings. □ 



C.5 Segment Fault Localization 

The purpose of fault localization is to find out which players are corrupted or, 
because agreement about this can usually not be reached, at least to narrow down 
the set of players containing the cheaters. The output of an (r, p)-localization is 
a set T> with \T>\ = p players, guaranteed to contain at least r corrupted players. 

The n shares of a d-shared value define a unique codeword of a code with 
Hamming distance {n — d). Hence over an asynchronous network, less than (n — 
t — d) faults can be detected, or less than ( "~*~<^ ) faults can be corrected. 
Therefore, computing linear functions requires t < d < n — 2>t. For our lazy 
re-sharing procedure, to preserve the ability for continuing the computation, it 
is required that after a (sequence of) (r, p)-localizations and player eliminations 
(without degree reduction), still t f holds, and the degree d of each sharing 
still satisfies t < d < n — “it.lt immediately follows that 3r > p. In the sequel, 
we use both (1, 2)-localizations as well as (1, 3)-localizations. 

In our protocol, if a segment has detected a fault, the first faulty sub- 
protocol is found and we invoke the corresponding fault localization procedure. 
In what follows, we outline the fault localization methodology for the various 
sub-protocols that are allowed to fail with a partial fault detection, viz. Efficien- 
tAV Share, ACheckShare and Degree Reduction. 

1. Sharing Protocol {Efficient AVShare): From the corresponding common 
set G, among all the players who complained about an inconsistency, each of 
the uncorrupted parties can agree on the player Pi with the smallest index i. 
From Pi’s CheckMessagCi all the parties know some Pj that Pi complained 
about. The parties execute a BA protocol (with their CheckM essagCi, that 
Pi sent to them, as input) to agree on a single Pj. Then every player sets 
D ■.= {P,Pi,Pj}, where P is the corresponding dealer. It is obvious that all 
players find the same set D, and at least one player in D must be corrupted, 
hence D is a (1, 3)-localization. 

2. Verifying Shares Protocol (ACheckShare): Let Pi be the player with 
the smallest index in the common set G who complained. Then the set 
V := {P,Pi} is the (1, 2)-localization. 
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3. Degree Reduction Protocol (DegreeReduction): Failure of DegreeRe- 
duction protocol is due to the failure of either or both of the above sub- 
protocols. Then the same T> is determined as in the first failed sub-protocol. 

D The Top-Level Protocol 

Let T : IF" ^ F be given by an arithmetic circuit A. Our protocol for t-securely 
computing T{x\^X 2 , ■ ■ ■ , Xn) is described in Fig. [31 

Theorem 4. The protocol AsyncSecureCompute allows a set ofn players, with 
at most t < j of them being corrupted, to securely compute a function over a 
finite field W , with a message complexity (excluding broadcast) of 0{mn^ Ig |F|)-|- 
mn^lgn) and 0{mn^) Byzantine Agreement simulations. 

The complexity analysis of our protocol is given in Section El This result should 
be compared with the most efficient perfect asynchronous secure computation 
protocol known before, namely the protocol of |3]. The protocol of |5] is briefly 
analyzed in Appendix A. 

E Complexity Analysis of our Protocol 

Complexity Analysis of Initial Sharing: In the AV-Sharci\P ,t, P] protocol the 
distribution phase communicates MC = ntlg|F| -|- n^lg|F| bits. In the ver- 
ification phase each party needs to send a star in the graph which requires 
0{nlgn) bits. So, this phase has MC = 0(n^lgn) and BAG = 0{nlgn). The 
InputShare[V , t] protocol runs AV-Sharci n times followed by an AgreeSet pro- 
tocol. Hence, MC = 0(n^lg|F| -l-n'^lgn) and BAG = O(n^lgn). 

Complexity Analysis of Receiving Output: The OutputReconstruction[P ,t, P] 
protocol requires each party to send a polynomial of degree t = 0{n) and n 
other field elements. This requires MC = 

Complexity Analysis of our Efficient VSS: In the Efficient AVSharCi protocol the 
distribution phase communicates MC = ntlg |F| -|- n^lg |F| bits. The pairwise 
consistency checks need another MC = 0{n'^ Ig n) bits to be sent. The agreement 
on G has BAC = 0(n). The ACheckShare protocol runs an Efficient AVSharci 
followed by bits of communication and an AgreeSet protocol. Hence, MC = 
0(n^ Ig |F| -I- Ign -I- n^) and BAC = 0(n). 

Complexity Analysis of Degree Reduction: The DegreeReduction protocol amounts 
to running each of the above two protocols, namely Efficient AV Share and ACheck- 
Share n times followed by an AgreeSet protocol. This requires MC = 0{n^ Ig |F| 
-I- n^lgn) bits and BAC = 0{n'^) bits. 

VSS with Eault Localization uses BAC = 0{lgn). 

Analysis of a Segment with I Multiplications: Every multiplication gate takes 
up to three degree reductions, re-sharing of arguments and the actual multi- 
plication, where the re-sharings can be performed in parallel. In every degree 
reduction only the actual computation with partial fault detection is performed. 
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Protocol AsyncSecureCompute[n, A, xi, . . . , x„] 

Code for party Pi'. 

1. Initialization: Set V := {Pi, P 2 , • • • , Pn} and t < [j] — 1. 

2. Inpnt Sharing: Set {G share, := Input Share[P,t] 

1 < fc < 1^1 ^ G share 

The secret Si of player Pi in the above sub- protocol is Xi. For a line I in the 
circuit, let denote the share of party Pi in the value of this line. If I is the 
input line of the circuit, then: Set := fi(x) if j € G share and Z^'^ 0 

otherwise. 

3. Computation: 

For each segment of the circuit : 

Repeat 

For each P^: Set FaultDetectedi := FALSE. 

For each gate g in the segment: 

Each player Pi acts as follows: 

Wait until the shares of all input lines of g are computed. 

If g is linear with output line Z and input lines li,h, ■ ■ ■■ 

Set Z := LinearGombination[P ,t, g,h,l 2 , ■ ■ ■]■ 

Players Pi with FaultDetectedi = TRUE use random shares. 

If p is a multiplication gate with output line I and input lines 
If Zfe, fc = 1 or 2 is shared with degree d greater than t: 

Call DegreeReduction[P,d,t,lk\ 

Every player Pi with FaultDetectedi = TRUE uses 
random shares in the sub-protocol. If the sub-protocol 
outputs ‘NULL’, then Pi sets FaultDetectedi TRUE 
Set Z := Multiplication[P ,t, g,li,l 2 ] 

Every player Pi with FaultDetectedi = TRUE uses 
random shares in the sub-protocol. If the sub-protocol 
outputs ‘FAIL’, then Pi sets PaultDetectedi ~ TRUE 
For each Pi, broadcast FaultDetectedi. 

Set G := AgreeSet[Q,P ,t],Q{j) = 1 for Pj if it received the broadcast from Pj. 
If at least one player in C has FaultDetectedi = TRUE: 

Each Pi broadcasts the index of the first sub-protocol that failed. 

Agree on the smallest sub-protocol that failed. 

Invoke the fault localization procedure for that sub-protocol to get P. 

Set V := P\D and t := t — 1. 

Until {{FaultDetectedi = FALSE) for all i G G). 

For each player P needing output: 

Call OutputReconstruction[P , t, P]. 



Fig. 3. Protocol for Asynchronous Secure Computation 
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At the end of the segment, during (strict) fault detection, n bits are broadcast 
as well as an AgreeSet is run. If there are faults, extended fault localization is 
performed. There are at most 0{nl) partial fault detections, hence to localize the 
first one which reported some failures 0{nlg{nl)) bits are broadcast. The total 
complexities are MC = 0{lrA\g |F| + IrAlgn) and BAC = + n\g{nl)). 

Cumulative Complexity Analysis: The protocol A sync Secure Compute [n. A, 
X\, . . . , x„] uses InputShare[V , t] sub-protocol followed by the computation of ^ 
segments, each of which has I multiplications. In all, at most t segments may 
fail and require repetition. Finally the protocol Output Reconstruetion[V At P] is 
performed 0{n) times, possibly in parallel. The overall complexity is as follows. 
MC = {0{n^ Ig |F| Ig n} -h { (f -f t) 0{ln^ Ig |F| -h In^ Ig n)} -h {0{n^ Ig n)} 
and BAC = {n^lgn} + {(x + i) {0{ln’^)}. When setting I = ^ and t = 0{n), 
we have MC = C(mn^ lg|F|) + mn^lgn) and BAC = 0{mn^). 

F Conclusions 

In this paper we study the communication complexity, which is the bottleneck 
in most distributed applications, of perfect asynchronous secure computation 
protocols. We rigorously distinguish between the actual protocol with fault de- 
tection and additional procedures for fault correction. This, when combined with 
the technique of player elimination for reducing the costs of repetitive malicious 
faults, allows a substantial reduction in the overall communication complexity 
of the protocol. 
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Appendix A : Brief Complexity Analysis of |[^ 

The AVSS scheme is implemented using the protocol V-Share which communi- 
cates MC = 0(n^lg |F| -hn^lgn) bits and BAC = O(nlgn) bits. The V-Recon 
protocol takes only C>(lg|F|) bits for a single player to reconstruct the out- 
put. The input sharing phase uses n V- Share’s and an ACS sub-protocol which 
sends 0{n? Ig^ n) bits and broadcasts 0{n) bits, leading to a total complexity 
of MC = 0(n^lg|F| -l-n'^lgn) and BAC = O(n^lgn). The computation phase 
uses the protocol BMUL for evaluating a single multiplication gate, in which 
each party runs t V- Share’s followed by an ACS, another V- Share, an AIS which 
communicates 0(n^lg^ n -I- n^lg |F|) bits and broadcasts 0{in?) bits, and n V- 
Recon’s. This leads to the following overall complexities for m multiplication 
gates. MC = 0{rnrA\g\W\ -\- rnrAlgn) and BAC = 0{mrA\gn). 
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Abstract. We study a distributed adversarial model of computation 
in which the faults are non- stationary and can move through the net- 
work (like viruses) as well as non-threshold (there is no specihc bound 
on the number of corrupted players at any given time). We show how to 
construct multiparty protocols that are perfectly secure against such gen- 
eralized mobile adversaries. The key element in our solution is devising 
non-threshold proactive verifiable secret sharing schemes that generalize 
the secret sharing schemes known in the literature. 



A Introduction 

Consider a fully connected network of n players (processors) who do not trust 
each other. Nevertheless they want to compute some agreed function of their 
inputs in a secure way. This is the secure multiparty computation problem. 
The players’ distrust in each other and in the underlying network is usually 
modeled via an adversary that has control over some of the players and commu- 
nication channels. Adversaries are classified according to their computational re- 
sources (limited (cryptographic) or unlimited (information theoretic)), their con- 
trol over communication (secure, insecure, or unauthenticated channels), their 
control over corrupted players (eavesdropping (passive), fail-stop, or Byzantine 
(active)), their mobility (static, adaptive, or mobile) and their corruption ca- 
pacity (threshold or non-threshold). In the information theoretic model one can 
distinguish between protocols with small (unconditional) or zero failure proba- 
bility (perfect). 

A. l Contributions of this Paper 

We propose a framework for constructing non-threshold proactive verifiable se- 
cret sharing schemes that strictly generalize the secret sharing schemes known in 
the literature. We study the notion of mobile adversary structures in the field of 
multiparty computation, and we construct protocols that provide perfect secu- 
rity (with zero error probability) tolerating such generalized mobile adversaries. 
The primary emphasis of this paper is on the existence of protocols. The de- 
signed protocols provide perfect security, that is, we consider a passive or an 

B. Roy and E. Okamoto (Eds.): INDOCRYPT 2000, LNCS 1977, pp. 130- 11421 2000. 

(c) Springer- Verlag Berlin Heidelberg 2000 



Generalized Mobile Adversaries in Secure Multiparty Computation 131 



active generalized mobile adversary with unbounded computing power, in the 
classical model with pairwise synchronous secure communication channels be- 
tween players, but not assuming a broadcast channel (which can be simulated, 
see ID). 

The threshold models of adversaries were found insufficient for capturing 
general scenarios of mutual trust and distrust among the players. This was illus- 
trated in I7I8I . But, note that even the protocols that are designed to withstand 
generalized (non-threshold) adversaries cannot tolerate the (more realistic) ad- 
versary that gradually breaks the security of the protocol by “releasing” an 
already corrupted player and corrupting another player in its place, that is, the 
adversary corrupts different set of players at different times during the compu- 
tation. Since even a momentary access to a player (processor) can expose that 
player’s share for the rest of the duration of the protocol, it is important to 
provide protocols that are secure against such strong determined adversaries. 

In this work, we define the security of a multiparty computation protocol 
with respect to a mobile adversary structure, a monotone set of subsets of the 
players, where the adversary is allowed to corrupt the players of one set in this 
mobile adversary structure in any single time perioqil and allow the players of 
different sets in this mobile adversary structure to be corrupted in different time 
periods. A mobile adversary structure is monotone in the sense of being closed 
with respect to taking subsets. 

A. 2 The Approach 

We follow the solutions proposed in the many of the previous efforts in the lit- 
erature e.g. [ 2 ]. The function to be computed by the protocol is without loss 
of generality specified by a circuit over a finite field (IF, -I-,*). The protocol can 
easily be constructed based on subprotocols for providing input, computing arbi- 
trary linear functions of shared values, multiplying (using degree reduction) the 
shared values, and receiving output. In our scheme, the input is shared using the 
non-threshold proactive verifiable secret sharing protocol described in Section El 
which also facilitates output reconstruction. This step is the key to our solution. 
Other subprotocols like, for computing linear functions, and, for multiplications, 
are modifications of the ones already known in the literature to our setting. 

B Non-threshold Proactive (Verifiable) Secret Sharing 

Non-Threshold Proactive (Verifiable) Secret Sharing strictly generalizes the se- 
cret sharing schemes known in the literature. In such a sharing scheme, the 
initial sharing is based on an access structure and the other core properties of 
proactive secret sharing, like periodic share renewal and share recovery, are also 
incorporated. Given a monotone access structure S C 2^ , the problem is to 
design a secret sharing protocol that is secure against a generalized mobile ad- 
versary that is characterized by the mobile adversary structure M., where M. 

^ Time is divided into time periods which are determined by the common global clock. 
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is, by definition, the compliment of the closure (with respect to taking super- 
sets) of S. Our construction of protocols for non-threshold proactive verifiable 
secret sharing is based on the subprotocols for the following tasks: Initial Secret 
Sharing, Periodic Share Renewal, Share Recovery, and Share Reconstruction. In 
our solution, first, the dealer shares the secret s using the initial secret sharing 
scheme which deals with withstanding the static non-threshold adversaries. This 
sharing is then “proactivized” to tolerate generalized mobile adversaries, that is, 
after the initialization, at the beginning of every time period, all honest players 
trigger an update phase consisting of the share recovery subprotocol followed by 
the share renewal subprotocol and erase all the share information of the previous 
time period. This “ensures” that the scheme is secure against mobile adversaries. 
For simplicity, we assume that if a player is corrupted during an update phase, 
then the player is corrupted during both periods adjacent to that update phase. 

B.l Initial Secret Sharing 

To the mobile adversary structure At, we attach an attribute, called the corrupt- 
ibility index, Cl, that is the minimum number of time periods required for any 
adversary characterized by At to corrupt every player at least once0 We show 
how to construct the initial secret sharing subprotocol for any given At provided 
Cl > 4 (we will see later that if Cl ^ 4 then it is impossible to design secure 
protocols for distributed computations) . We start by defining a structure called 
the share distribution tree (SDT) which is a tree in which each node has some 
information (its share etc.) about the secret s that is to be distributed among the 
players in V. The root node has full information about s. Let L denote the set 
of leaf nodes of the SDT. Every player Pi in V is assigned a (mutually disjoint) 
set of leaf nodes in L. That is there exists a Player Share Function F that maps 
every player to some subset of leaf nodes in a way that no leaf node is attributed 
to more than one player. Pi’s share of the secret s is the set of shares defined 
by the leaf nodes in F{Pi). The cardinality of a player Pi is defined as the 
number of elements in the set F{Pi). 



A Suitable SDT Topology and Corresponding F{-): A protocol will work 
correctly on any SDT that satisfies the following property irrespective the set in 
At that has been corrupted. An adversary can choose to corrupt any set in the 
structure At: say he chooses the set • ■ • > Let L' = {h,l 2 ,l 3 , ■ ■ ■ In} 

be the corresponding corrupted leaf nodes, that is L' = Ui=i Next, any 

non- leaf node w in the SDT, with children ui,U 2 , ... , Um, is iteratively marked as 
corrupted if more than of the Ui’s are corrupted. This process is repeated 

for all the nodes in the SDT in a bottom-up fashion (start from the lowest level 
and go up to the root). The required property is that notwithstanding which 
set in the structure A4 has been corrupted the root of the SDT should not be 
marked as corrupted. It can be immediately seen that seen that constructing a 



2 



Cl = oo if AI does not cover P. 
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suitable SDT has a large number of degrees of freedom. As a first step, we note 
that the player simulation tree of [8j (based on the ideas of [TEE]) satisfies the 
required property. We use their construction to get the topology of the SDT. 



Implementation: Without much security concern, we assume that for the given 
mobile adversary structure M, the (same) topology of a suitable SDT and its 
corresponding •?'(•) function are known to all players. First, the dealer (locally) 
distributes the secret s across the SDT as follows: Assign s to the root (at level 
0) of the SDT. Each (non-leaf) node at level I with share si recursively shares 
it with its fc = 4 childreijl at level I -I- 1 using a variation of the Shamir’s 1- 
threshold secret sharing scheme PEI, extended to two-dimensions. That is, for 
each node u at level I the dealer chooses at random a polynomial, pu{x,y) = 
Si -I- ai{x + y) + bi(x ■ y) and assigns the polynomial fu,m{x) = Pu{x,m) = 
Si + aim + x(ai + b\m) to node r^, which is m’s child at level I + 1, for 
TO = 1, . . . , 4. The share of the node Xm is Sm = /u,m(0) = sj -I- am. Its share- 
shares are Sjm = fu,mij) = si + aim + j{ai + bim), for j = 1, . . . , 4. Only the 
share, not the share-shares, is used for further recursive sharing^ 

The dealer distributes the share-shares of all the internal nodes ui,U 2 , . ■ . ,U]\[ 
of the SDT. For distributing the share-shares associated with the node Ud the 
dealer does the following. Let w be the parent of Ud and let vi,V 2 ,V 3 and V 4 be w’s 
children such that Ud = Vr (i.e. Ud is the child of w). The dealer distributes 
the four share-shares associated with the internal node Ud, d = 1,2,..., by 
recursively sharing (using Shamir’s secret sharing scheme) each of these four 
share-shares (xir,X 2 r,X 3 r,X 4 r) through the subtree rooted at Ud. That is, for 
each internal node Ud, the dealer chooses at random four polynomials (in one 
dimension), qud,j{u) = + cLudVi j = and assigns the share-share- 

shares Xjr + am to node r^, which is Ud’s to*^ child, for to = 1, . . . , 4. If any of 
Ud’s children is not a leaf node then that child’s share-share-shares are further 
recursively shared (using random polynomials) in the same manner. Apart from 
this, the dealer associates with all the ViS the share-share-shares that resulted 
during the sharing the share-share Xir through the subtree rooted at Ud. For all 
the other non-leaf Uj’s, the dealer, instead of just sending the share-share-shares, 
shares the share-share-share (resulting from sharing Xir) of every leaf node in the 
subtree rooted at Ud, along the subtree rooted at Vi (using random one degree 
polynomials). We call the values received by the leaves of the subtree rooted at 
Vi as share- coefficients. Furthermore, the dealer sends the random polynomials 
used to distribute, through the subtree rooted at Vi, the share-share-share of 
some leaf node I, to the player associated with 1. After thus locally spreading the 
secret in the SDT, the dealer sends, through secure communication channels, to 
each player Pi gV the shares, share-shares, share-share-shares, share-coefficients 
and polynomial coefficients associated with the leaf nodes that occur in 'T(Pi). 

® fc is always 4 in the case where SDT topology of |8] is used. But the implementation 
can be modified to make it compatible with any other (more efficient) topology. 

^ If the share-shares too are recursively shared in the manner similar to the shares, 
the communication complexity increases geometrically. 
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When the adversary is active {Byzantine), for each non-leaf node w in the SDT 

with children ui,U 2 ,U 3 ,U 4 , the following check is adopted. Let Xji denote the 

share-shares assigned to the node Ui, i,j = 1, ... ,4. Each pair of nodes Ua,Ub 

? 

(for 1 < o, 5 < 4) check whether Xab = Xba- 
? 

Verifying Xab = Xba for nodes Ua,ut: Case 1 (Both Ua and Ub are leaf nodes): 
The player Pj, (associated with Ub) sends Xab to Pe (associated with Ua) who 
does the verification and broadcasts the result (consistent or inconsistent). 

Case 2 (Node Ua is a leaf node hut not Ub ): The players Pb, ,■■■ , Pb^, associated 
with the leaves of the subtree Tb rooted at Ub send their share-share-shares of 
Xab to player Pd (associated with Ua) who broadcasts the result. 

Case 3 (Node Ub is a leaf node hut not Ua): The player P^ associated with Ub 
shares Xab through the subtree Ta rooted at Ua exactly as the dealer did {Pe has 
the information about the share-share-shares of the leaves of Ta). Each of the 
leaf node of Ta just simply locally verify that the share-share-share he has due to 
Xab is the same as that sent by Pg. In case of any mismatch, the corresponding 
player broadcasts a douht and that leaf is marked as doubtful. If any of the leaves 
are doubtful, then iteratively, each non-leaf node w of Ta that has two or more 
of its children as doubtful is also marked doubtful. If Ua is doubtful, then the 
result is considered inconsistent else it is consistent. 

Case 4 (Neither Ua nor Ub is a leaf node): For each leaf node lb-,i = 1, . . . , e in Tb, 
let tbi denote its respective share-share-share. Each player Pj G {Pb ^ , . . . , Pb^, } 
(players corresponding to the leaves of Tb), shares the value tb, among the play- 
ers Pai , • ■ • 7 Pag (players associated with the leaves of Ta) along Ta using ran- 
domly chosen one degree polynomials, say, Cj>, f = 1, 2, ... , (where 
is the number of non-leaf nodes in the subtree rooted at u). We know that the 
share-share-share tb, has already been shared along Ta by the dealer using the 
polynomials whose coefficients are say, ej>, j' = 1,2, . . . and are known 
to Pj. Let scaj, be the share-coefficient of Pa^,, k = l,...,c corresponding to 
the sharing of the share-share-share tb,. The player Pj also sends to each of 
the players Pa^i ™ = 1, . . . ,c some additional check values (explained below). 
Let r\,r 2 , . . . ,rt be the path from node Ua to the leaf node la„, associated with 
Pa^ and let Cx (given by the dealer) and Cx (using now) , a: = 1, . . . , t be the 
corresponding sharing polynomials’ coefficients. The player Pj sends to the 
values {cq — Cq), for g = 1, . . . , t. Let ta\ EC^^ denote the respective values 
that the player corresponding to the node la,, received from the player Pj when 
sharing tb,, i = 1, . . . , e, k = 1, . . . ,d. Next, the players Pa, , . . . , Pa^ work with 
the received ta\’s and ECal’s as follows. Each player Pa,, locally does the follow- 
ing verification. He checks whether sca,, = t()^ PY^q=i Xq‘{oq — Cq) and broadcasts 

the result. Once these checks have been completed, it is easy to verify whether 
? 

Xab = Xba. For doing this verification, each of the leaf nodes la,,la 2 , ■ ■ ■ i just 
simply locally verifies that the values sca^ and ta( + Xq • {cq — Cq) are the 
same for sufficiently “large” number of z’s. By sufficiently “large”, we mean the 
following. From the bottom of Tb, the player Pa,, verifies the following. Assign 
the value 1 to the leaf node lb, if sCa,, = t'f)^ + Xq ■ {cq — Cq), else assign 
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0. Iteratively for all non- leaf nodes w in Tf, assign the value 1 to w if three or 
more of its children are assigned the value 1 else assign 0 to w. If the node Ub is 
assigned 0, then the player Pa-k broadcasts a doubt and that node (lak) is marked 
as doubtful. If any of the leaf nodes in Ta is doubtful, then iteratively, each non- 
leaf node w of Ta that has two or more of its children marked as doubtful is also 

? 

marked doubtful. If Ua is doubtful, then the result of Xab = Xba is considered 
inconsistent else it is consistent. This ends the verification procedure. 

After the above procedure has been repeated for all the children of each 
non-leaf node w, the following is done. In case of any inconsistency, the player 
broadcasts the corresponding index and the dealer answers by broadcasting the 
corresponding correct values. If more than one node triggers inconsistency for 
some node Ui or discovers that the dealer’s answers contradict its own values, 
an accusation is registered by the players which the dealer defends publicly by 
broadcasting the polynomial that it originally had assigned to Ui in the sharing 
phase leading to further accusations. If two or more accusations are registered 
(due to child nodes of the non-leaf node w) then the node w is marked damaged. 
If any of the nodes have been marked damaged, then iteratively, each non-leaf 
node w of the SDT that has two or more of its children marked as damaged is 
also marked damaged. If the root of the SDT is marked damaged or if the dealer 
did not answer all the inconsistencies and accusations, the dealer is corrupted 
and a default sharing of secret 0 is taken. The initial secret sharing subprotocol 
is thus implemented. The following theorem succinctly summarizes the outcome 
of this subsection. 

Theorem 1. The above construction constitutes a initial secret sharing sub- 
protocol secure against Ai static for any given mobile adversary structure AA of 
corruptibility index Cl. For an active adversary we require that Cl > 4. 

Proof. See [H| □ 



Periodic Share Renewal After the initialization, each player Pi G V holds 
his share(s), share-shares, share-share-shares, share-coefficients and polynomial 
coefficients of the secret s defined by the leaves that occur in T{Pi). 

Passive Adversary: The shares computed in the period t are denoted by 
using the superscript (t). To renew the shares at period t = 1,2, ... , we modify 
the ideas in m to the non-threshold setting. In our system, each player Pi 
secret shares the value 0 using the initial secret sharing subprotocol. That is, 
each player Pi selects at random polynomials (for each node u), hu^i{x,y) = 
bu,i{x-y)-\-au,i{x-\-y)-\-secret (when u is the root, secret = 0) as the secret sharing 
polynomials and for each node u selects at random additional four polynomials 
9u,diy) = Cu,i{y) + secret d = 1, . . . , 4 for the distribution of the share-shares of 
node u. The player Pi then locally constructs an SDTi and then sends to each 
other player Pj G V the relevant shares, share-shares, share-share-shares, share- 
coefficients and polynomial coefficients associated. Let hqi^m ( and hqij^rn and 
hqij^{u) and v^ij^) and pCqi g (^a)) denote Pfs share (respectively four share- 
shares, j = 1, ... ,4, and the share-share-share (s) of the share-share of the 
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internal node u and the relevant share-coefficients and polynomial coefficients) 
sent by player Pq corresponding to the leaf node at level Im, for 1 < m < rii. 
Each player Pi (locally) computes its new share (s-*^), share-shares (s-*^^), 



share-share-shares share-coefficients and polynomial coefficients 



(„)) as follows and erases all the variables not pertaining to the new time 



i,{u) ^ 



period: 

V b,(“) 



„(*-!) 



„b-i) 



m ^ X)g=l hqi^rnj , j , 



and {pcfl(^u) ^ p4,g}l) +T."q=lPCq^,g,(u)) 

where j = 1, . . . , 4 , 5 = 1, 2, . . . , and m = 1, . . . , ni. 

Active Adversary: In the above passive share renewal protocol an active 
adversary controlling a player can cause the destruction of the secret by dealing 
inconsistent share updates or just by choosing a polynomial with constant term ^ 
0. We describe our method that helps circumvent the above problem. Each player 
Pi selects at random polynomials (for each node u in the SDT), hu^i{x,y) = 
bu,i{x ■ y) + au,i{x + y) + secret as the secret sharing polynomial and for each 
node u selects at random additional four polynomials gu,d{y) = Cu,i(y) + seeret 
d = 1, ... ,4 for the distribution of the four share-shares of node u. Let Zqi^m 
{Zqijm, Zq^j^(u), Vqi,(u) &nd pc,*,g_(„)) denote Pi’s share (respectively four 
share-shares, j = 1, ... ,4, the share-share-share (s) of the share-share of the 
internal node u, the relevant share-coefficients and the polynomial coefficient 
values of SDTq) sent by player Pq using SDTq, corresponding to the leaf node 
< rn < rii. Let tr^ be the first step in the path from the root to the leaf node 
Im, that is the node Im is in the subtree rooted at the root’s child. First, Pi 
locally computes Hqi ^a — ' Zqi^iYi^ Hqij^m — ‘ Zqij^iTi<j Hqij (jji^ — ‘ Zqij^[u)’i 

• "^qi,{u) and PCq^^g^(u) = -pCqi,g,{u)- Next, the player Pi applies 
the degree reduction protocol (as will be explained in the sequel) to the shares 

Hqi,rm Hqij,rm Hqij,(u)-i ^qi,(u) and PCqi gJ^^'j tO get „j, ^qij,(u)i ’^qi,{u) 

and pcL ^ respectively. Then the new shares are calculated by adding the sum 
of these values to the original share (as in the passive case) . The main result of 
this subsection is summarized in the following theorem. 



Theorem 2. The new shares, share-shares, share- share- shares, share- eoejficients 
and polynomial coefficients computed hy the above procedure at the end of the 
update phase correspond to the secret s, and the active (or passive) adversary 
■A-active (or Apassive) that at any single time period corrupts no more that one 
set in the mobile adversary structure A4 learns nothing about the secret. 



Proof. See El 



□ 



Share Recovery Scheme In a non-threshold proactive secret sharing system, 
the participating players must be able to make sure whether shares of other 
participating players have not been corrupted or lost, and restore the correct 
share if necessary. Otherwise, an adversary could cause the loss of the secret by 
gradually destroying the shares. 
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Detection and Localization of the lost/corrupt nodes: For each non-leaf node w in 
the SDT with children u\,U 2 , M 3 and U 4 with share-shares x._., each pair of nodes 

7 

Ua,Ub (for 1 < a, 6 < 4) check whether Xab = Xba- For this, the players adopt 
the verification procedure outlined in page 1 1341 Each non-leaf node w whose 
children-nodes triggered two or more inconsistencies are detected as lost, in the 
passive model, or corrupted, in the active model. Also, for every non-leaf node 
that is detected as lost/corrupted, all the nodes in the subtree rooted at that 
node is marked as lost /corrupted as well. We denote these lost/corrupted nodes 
hy B = {ub,,Ub 2 , ■ ■ ■ ,Ubd}j where d is the total number of corrupted nodes in 
the current time period. It is not possible to exactly as well as perfect-privately 
categorize and globally agree on the leaf nodes that are corrupted. Therefore, 
all the leaf nodes (i.e. their associated players) undergo the recovery procedure 
illustrated below. 

Share Recovery: In this section we will show how to recover the corrupted 
shares, share-shares, share-share-shares, share-coefficients and the polynomial 
coefficients of the corrupted players. 

Passive Adversary: Recovering the shares: To recover the share s^x},m of player 
Pxi corresponding to the leaf node Im, where Im is the child of w in the SDT 
(let vi,V 2 and M 3 be the other children of w and let Im be the child of w), 
the following method is used depending on whether any corrupted node exists 
in the path from Im to the root. 

Case 1: No node in the path from Im to the root is in B: Each player Py, y = 
I, . . . ,n secret-shares some random secret of his choice using SDTy (of the same 
topology as the SDT used initially) in such a way that the share of the node 
Im is zero (this is achieved using the scheme outlined later in the sequel). After 
this step, each player Py has n additional random shares, share-shares, share- 
share-shares, share-coefficients and polynomial coefficients (corresponding to the 
SDTy’s y = l,...,n) for each of the (original) values respectively. Next, each 
player Py randomizes his (original) share, share-shares, share-share-shares, share- 
coefficients and polynomial coefficient values by respectively adding to them 
the sum of the n corresponding random ones. Subsequently, player P^^ first 
reconstructs (as will be explained below) the randomized share of each (other) 
child of node w, namely Vi, z = 1,2,3 and then interpolates these randomized 
shares to recover the share Sx},m of node Ub,j. The former step is executed as 
follows. Each player Px, that is associated with a leaf of the subtree rooted at Vi, 
sends his randomized shares, share-shares and share-share-shares corresponding 
the nodes vi , V 2 and M 3 to the player Px, who then interpolates the share Sxlm 
from the random shares of all those vfs that are consistent with all but at most 
one random share-share. 

Case 2: Some node in the path from Im to the root is in B: Let w' be the first 
corrupted node in the path from the root to the leaf node Im- It is clear that such 
a, w' yt: root (eventually) exists since the root is maintained uncorrupted. Let l^ , 

® Since the sharing was originally 1-threshold, two inconsistencies imply corruption. 
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i = 1, 2, ... be the leaf nodes of the subtree rooted at w' and let l[ = 1^- Let the 
node u be w”s parent and let n', * = 1, . . . , 4 be u’ children such that u' = w' . 
Then, by the definition of w' , the node u is uncorrupted. And since u is not 
corrupted we can recover the information associated with as done in case 10 
The only difference from the methodology from case 1 for achieving the recovery 
of the share information associated with the node is that, each player Py, 
y = 1, . . . ,n secret-shares some random secret of his choice using SDTy in such a 
way that the share of the node is zero (instead of the share of some leaf node 
being zero). After this step, the subsequent steps are all same as in case 1. After 
the recovery, the player . to re-shares the recovered information of through 
the subtree rooted at using the procedure outlined in the implementation of 
the initial sharing scheme (See Subsection IH.1 II . 

Thus, the shares of any corrupted player can be recovered. It is easily seen 
that to recover the share-shares, share-share-shares, share-coefficients of player 
Pxi, a similar randomize-then-reconstruct procedure can be adopted. The only 
major difference will be in each player Py, y = l,...,n secret-sharing some 
random secret of his choice using SDTy in such a way that the corresponding 
share-share (respectively share-share-shares, and share-coefficients) of the rele- 
vant node is zero. Once all the share-coefficients (of that node) are recovered, the 
players associated with the leaves of the subtree rooted at that relevant node re- 
share their new share-share-shares respectively through the corresponding sub- 
trees thereby, in effect, “recovering” (actually, newly creating) the polynomial 
coefficients. 

Secret sharing such that the some required information associated with some 
given node w is zero: Let ji,j 2 , ■ ■ ■ ,jr be the path from the root of the SDT 
to the node w, that is the node w is the root’s child’s child’s . . . 
child, and let wi, I = 1, ... ,r denote these intermediate nodes respectively, (note 
that w = Wr). Let the sharing polynomial used at node wi, I = l,...,r be 
pi{x,y) = bifx • y) + ai{x + y) + secret and let the polynomials used at wi 
for sharing the four share-shares (or share-share-shares) associated with wi (re- 
spectively associated with the share-share of u) be %,(„)(?/) = + 

secret, j = l,...,k. (that is, wi = u when sharing the share-shares). The 
dealer, who wants the share of node w to be zero, chooses to share the se- 
cret s where s = — (X][=i ’JO- To get the share-share of node w to 
be zero, s = — ((X)[=i ‘ ji) + + br)). For the share-share-share of the 

share-share of node u present in node w to be zero, chooses the value of 
s to be s = - • di)) + iYdhiai ■ ji)) + (ar„ • m) + (6r„ ■ m)] 

where ji, j 2 , • • ■ j Jr„ is the path from root to the node u. For the share-coefficient 
of node w (in the subtree rooted at Ua), due to the share-share-share of node 
u (in the subtree rooted at Ub) for checking equality of the share-shares of Ua 
and Ub, to be zero, the dealer chooses r random one degree polynomials with 
coefficients ei,...,Cr to share sss„ along the subtree rooted at Ua such that 

® Since is already corrupted, without any loss in security, the recovery of the 
share information associated with the node can be performed by any player, in 
particular by the player P^- . 
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(ELi f^i-3i)+s 



where the share-share-share of node u is sss„ and 



ji, J 2 ) ■ ■ ■ ,jr is the path from Ua to w. 

Active Adversary: We note that using the above passive share recovery pro- 
tocol in the presence of an active adversary gives rise to two problems namely, 
an active adversary controlling a player can cause the destruction of the se- 
cret by dealing inconsistent random shares just by choosing the random secret 
sharing polynomial such that the share at the required node w is not equal to 
zero, and secondly, in the case 2 (where there exists some corrupted node 
in the path from Im to the root) the recovery and re-sharing of the information 
in the node cannot be performed by any player as if it were corrupted, it 
could re-share incorrectly. The second problem can be solved by reconstructing 
towards all the leaves of the subtree rooted at and run an Byzantine Agree- 
ment among them to ‘fix’ a sharing. Our method to solve the first problem is 
similar to the method used in the active share renewal protocol. Each player 
Pq, q = 1, . . . ,n shares a randomly chosen value giving rise to the SDTq. The 
player Pq sends to each other player Pj € V the shares, share-shares, share-share- 
shares, share-coefficients and the polynomial coefficients associated with the leaf 
nodes that occur in 'f'(Pq) denoted Zqi^rn, Zqijm, Zqij^(u), Vqi^(u) and 
g = 1, . . . , respectively. To get a sharing with the share of node w (where 
w is the child of u] wi,W 2 ,W 3 and Wk be the children of u {w = Wr)) as 
zero, each player Pt^ associated with the leaf node , z = 1, . . . , T that is in the 
subtree rooted at Wx, x G {1, ... ,4} locally computes Hqt-^m = {x — r) ■ Zqt^^m, 

Pqtij,ni — {x r^ ‘ Zqt^j^rnj ^qtij,{u) ~ (x ' Zqt^j^{u)^ ^qti^u) ~ {x ' '^qti,{u)- 

After this step, the players Pt-, i = 1, . . . ,T apply the degree reduction proto- 
col for the subtree rooted at u to and compute the random share, share-shares, 
share-share-shares, share-coefficients and polynomial coefficients in the manner 
similar to the share renewal protocol. To get a sharing with the share-share 
of node w as zero, the procedure is almost the same as in the case 1, with the 
only variation being that each player Pt- uses {x — d) instead of {x — r). To get a 
sharing with the share-share-share of the share-share of node u that is with 
node w to be zero, only the share-share-shares and the share-coefficients need to 
be multiplied by (x — r). 



Theorem 3. If the adversary AacUve ( or Apassive) compromises no more than 
one set in the mobile adversary structure M, with corruptibility index Cl > 4, in 
any time period, then every recovering player that follows the above protocol re- 
covers its correct share, share-shares, share- share- shares, share- coefficients, and 
polynomial coefficients while the adversary AacUve (or Apassive) learns nothing 
about the original secret shared among the players. 



Secret Reconstruction Let P be the designated player supposed to receive 
a value s that is non-threshold proactively shared among the players. Let Ni 
denote the last level of the SDT (root is at level 0). 




140 K. Srinathan and C. Pandu Rangan 



1. Every player Pi £ V sends to P his share his share-shares j = 

1, . . . , 4, m = 1, 2, . . . , rii, and his share-share-shares of the share-share 
of internal nodes ? = 1, . . . ,4. 

2. The player P recursively reconstructs the shares for all the nodes at levels 

TV; — 1 down to 0: For each node w at level fV; — 1, let u\,U 2 , and Ui be 
its children (clearly, all Ui’s are leaf nodes). He reconstruct w’s share by the 
Lagrangian interpolation of the shares of the Ui’s that are consistent with 
all but at most one share-share. This is followed by the reconstruction of 
the share-shares of the each node w at level Ni — 1 from the share-share- 
shares j = 1,...,4. This is done by recursively reconstructing the 

share-share of w from the share-share-shares of only those Ui’s that were 
(previously) found consistent with all but at most one share-share. Player 
P can then interpolate a unique straight line on these values 0 and pick the 
constant term as the required share-share of w. Player P is now ready with 
the shares and share-shares of all nodes at level Ni — 1 and can in a similar 
fashion interpolate those of nodes at level Ni — 2 and so on till the required 
secret s is recovered. 



B.2 Non-Threshold Proactive Verifiable Secret Sharing: The 
Combined Protocol 

Adhering all the above pieces we get our full protocol for non-threshold proactive 
verifiable secret sharing tolerating generalized mobile adversaries. 

1. The dealer shares the secret s using the initial secret sharing protocol. 

2. At the beginning of every time period, the update phase is triggered, con- 
sisting of 

(a) the lost share detection and localization. 

(b) the lost/corrupted share recovery protocol. 

(c) the share renewal protocol. 

3. Share reconstruction protocol (whenever required). 

C Secure Multiparty Computation 

The function to be computed by the protocol is without loss of generality spec- 
ified by a circuit over a finite field (iF, -|-,*). We follow the solutions in the 
previous literature in the sense that the protocol consists of three stages: provid- 
ing input, computation stage (computation of arbitrary linear functions of shared 
values, and multiplication using degree reduction of the shared values), receiving 
output. In our scheme, the input is provided using the non-threshold proactive 

^ The adversary may corrupt only the share-share-shares leaving the share-shares in- 
tact leading the player P use the corrupted share-share-share. This is not much of a 
problem because, even if all the four share-share-shares (with one corrupted) were 
to be used, P can easily find those three values that are collinear. 
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verifiable secret sharing protocol described in the previous section which also 
facilitates output reconstruction. 

Linear Combination: Let £ be a linear function and assume that the values 
a,b, . . . are non-threshold proactively shared. Due to the linearity of C, in order 
to get the sharing of c = £(o, 6, . . .), it is sufficient if each player locally computes 
the linear combination of its shares (and share-shares, share-share-shares, share- 
coefficients and polynomial coefficients) of a and b. 

Multiplication: Assume that two values a and b are non-threshold proactively 
shared. First, each player just locally multiplies the corresponding share, share- 
shares, and internal node share-shares of the two sharings. This defines a sharing 
of c = a • 6 but the used polynomials are of double the degree. Next, the players 
perform degree reduction. The key idea of degree reduction is to non-threshold 
proactively verifiably share the shares (using the initial sharing protocol on 
the same SDT topology) and prove that the value shared in the previous step is 
indeed and then locally compute the linear combination to get the degree 
reduced sharing. This protocol can be implemented using a straight-forward 
adaptation of the protocol of |2] improved by [3, to the non-threshold setting. 

From the results of [8], it is clear that the lower bounds of 3 (in the passive 
model) and 4 (in the active model) are necessary. In [Hj these bounds are proved 
as necessary for their adversary model ( “stationary” generalized adversary) and 
since our adversary is a generalization ( “non-stationary” generalized adversary) 
of theirs, these lower bounds trivially hold in our case too. In this work, we have 
shown that these bounds are also sufficient in the case of active adversaries by 
constructing secure multiparty protocols that are resilient against generalized 
mobile active adversaries characterized by a mobile adversary structure of cor- 
ruptibility index Cl > 4. Due to the “rebooting” sort of mechanisms involved 
in the removal of the adversary’s effect on a corrupted player, there is loss of 
share information even in the case of a passive adversary leading to our solution 
requiring Cl > 4 even for the passive case. 

Theorem 4. Let Cl denote the corruptibility index of a mobile adversary struc- 
ture A4. A player set V can compute any function perfectly M-securely in the 
active model, where players corrupted during an update phase are considered as 
corrupted in both the adjacent time periods, if and only if. Cl > 4. 



D Conclusion 



We have studied the problem of secure multiparty computation in a generalized 
model where the adversary is, simultaneously, both mobile [2] and non-threshold 
m- We have proposed constructions that, for any admissible adversary, yield 
secure protocols offering perfect security. Moreover, as a by-product we have 
developed a non-threshold verifiable proactive secret sharing scheme that may 
have further applications on its own. 
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Abstract. The work is concerned with identification of bad signatures 
in a sequence which is validated using batching. Identification codes (id- 
codes) are dehned and their general properties are investigated. The 
generic construction for a wide range of id-codes is given and its instan- 
tiation using mutually orthogonal Latin squares is described. Hierarchical 
identihcation is studied for two cases when the identification procedure 
uses a family of id-codes and when there is a single underlying id-code. 
Remarks about future research conclude the work. 



Keywords: Digital Signatures, Batch Verification, Identification Codes. 



A Introduction 

The reliance of e-Commerce on digital money has a dramatic impact on the 
computing load imposed on the bank. The bank has become the focal point 
where all electronic money (digital signatures) are flowing. Observe that be- 
fore the transaction is approved, the electronic money must be validated. Batch 
verification is an attractive short cut for signature validation saving time and 
computing resources. It is applicable whenever the verifier gets a large number of 
digital signatures generated by the same signer provided the signature exhibits 
the homomorphic property allowing signatures to be validated in batches in the 
expense of a single exponentiation. 

If the batch passes the validation, all signatures are considered correct and are 
accepted. If however, the batch fails to pass the validation test, the verifier must 
identify invalid signatures in the batch. Clearly, rejection of the whole batch is 
not an option. A natural question arises: how to identify invalid signatures in the 
batch so the valid signatures can be accepted ? Additionally, one would expect 
that the identification process should be as efficient as possible. 



B. Roy and E. Okamoto (Eds.): INDOCRYPT 2000, LNCS 1977, pp. 143- H541 2000. 
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B Background 

Batch verification makes sense if the signatures in a batch are related or gener- 
ated by the same signer. There are two types of signatures which can be batched: 
RSA signatures and DSA (DSS) signatures. The RSA signatures use the fixed 
exponent (public key of the signer) for verification. Assume that we have n mes- 
sages and their signatures. The signatures can be verified independently at the 
expense of n exponentiations. The batch containing all signature can be verified 
at the expense of a single exponentiation plus (n — 1) modular multiplications. 

DSA signatures are based on exponentiation when the base is fixed and 
publicly known. Again n signatures can be verified one by one at the expense 
of n exponentiations. The batch of n signatures are validated using a single 
exponentiation and (n — 1) modular additions. 

Bellare, Garay and Rabin pp developed verification tests which are secure 
against any attacker. The security of the test is measured by the probability 
that a contaminated batch passes it making the verifier to accept all invalid or 
bad signatures contained in the batch. The probability of slipping bad signatures 
through the test can be traded with efficiency. 

The problem we address in this work is an efficient identification of bad signa- 
tures after the test fails. There is a general method of bad signature identification 
which is called “cut and choose” in [3] or “divide and conquer” in |3]. It takes a 
contaminated batch and splits it repeatedly until all bad signatures are identi- 
fied. The efficiency of this method depends on the degree of contamination (or 
how many bad signatures are in the batch) and also on how the bad signatures 
are distributed in the batch. 

Note that identification of bad signatures resembles the problem of error cor- 
rection. To be able to correct errors, the code must clearly identify all positions 
on which errors have occurred. As observed in [1], error correcting codes can be 
applicable for bad signature identification. There is a major difference between 
error correcting codes and identification codes or id-codes which allow to identify 
bad signatures. Computations in error correcting codes are done in the binary 
field with EXCLUSIVE-OR addition (XOR). The interaction among valid and 
invalid signatures within the batch are governed by INCLUSIVE-OR (logical 
OR). 

The work is structured as follows. The model for id-codes is studied in Section 
ICl Section|D]investigates general properties of id-codes. The general construction 
based on OR-checking matrix and its instantiation based on mutually orthogonal 
Latin squares are given in Section El Hierarchical identification is described in 
Section El A discussion about further work on id-codes closes the work. 

C The Model 

The problem we are dealing with is bad signature identification in a batch which 
has failed to pass the test. The test T is a probabilistic algorithm which takes a 
batch of an arbitrary length and produces a binary outcome accept/reject. Any 
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clean batch always passes the test. A dirty batch (which contains one or more 
bad signatures) fails the test with an overwhelming probability. 

Definition 1. Given a hatch = {{rrii, Si)|i = 1, . . . , m} of signed documents 
(mi is the i-th document and Si its signature). The identification code IC{u,f) 
able to identify up to t bad signatures is a collection of sub-batches {Bi, . . . ,By) 
where Bi C B^ such that for any possible pattern of up to t bad signatures, 
the outcomes (the syndrome) S = {T{B\), . . . ,T{Bv)) uniquely identifies all bad 
signatures. 

The identification code IC{u,t) can be equivalently represented by its u x u 
test-checking matrix A = [a^] such that 

_ f 1 if {mi, Si) G Bj 
( 0 otherwise 

Clearly, for a fixed size u of the batch, one would like to obtain a code IC{u, t) 
with the parameter v as small as possible. Note that v indicates how many tests 
T must be run to identify all bad signatures and it can be considered as the 
parameter characterising the efficiency of the code. The parameter v is upper 
bounded by u as it is always possible to design a trivial code whose matrix A is 
the uxu identity matrix. This code is equivalent to serial validation of signatures 
one by one. 

The following notation is introduced. The code IC{u, f) is uniquely identified 
by its {v X u) test-checking matrix A. The entries of A are binary. Columns of the 
matrix A are indexed by u signatures in a batch. So the matrix A can be seen 
as a sequence of columns of the form A = {A\, . . . , A„). The index of the t-th 
signature in the batch B"^ is the i-th column Ai . A row specifies the corresponding 
sub-batch which includes all signatures for which the entries are 1. 

Note that if the i-th signature is bad the syndrome produced for a batch 
contaminated by it is equal to Ai or S{i) = Ai. Given a batch ,8“ with t bad 
signatures. Assume further that the bad signatures have occurred on positions 
(6i, . . . ,bt) in the batch Their corresponding indices are {Ab^, . . . , De- 
note the syndrome produced for the batch as S{bi, . . . ,bt) = A^^ V ... V A^,^, 
where V is bit-by-bit inclusive (logical) OR. For example, if 



Ai = 


'l' 

1 

0 


and A 2 = 


'l' 

0 

1 


then Ai V A 2 = 


A' 

1 

1 




0 




0 




0 



D Properties of Id-Codes 



Using an information-theoretic arguments, we argue that there is a lower bound 
on the V parameter. 




146 



Jaroslaw Pastuszak, Josef Pieprzyk, and Jennifer Seberry 



Theorem 1. Given an id-code IC{u,t) which always identifies correctly any t 
bad signatures in the batch of the size u. Then the number of tests (and the 
number of collections) v satisfies the following inequality 



Proof. Given a batch of u elements contaminated by at most t bad signatures. 
The identification of bad signatures is possible if the syndromes are distinct for 
all patterns of i bad signatures (* < t) so knowing the syndrome, it is possible 
to determine the positions of bad signatures in the batch. Note that there are 



different identifiable patterns (including the pattern with no bad signature) . Now 
if we have v sub-batches {Bi, . . . ,By), then the test T applied for a single sub- 
batch Bj] 0 < j < t, provides a binary outcome (pass/fail) so the number of 
possible syndromes is 2^. Clearly 



and the bound described by Equation © holds. 

Obviously, searching for id-codes makes sense if they are better (take less 
tests) than the naive id-code which tests batches containing single signatures. 
>From Theorem [T| we can derive an interesting corollary. 

Corollary 1. Id-codes better than the naive id-code exist only if t < u/2. 

Proof. If t > n/2, the number of tests 



Thus the number of tests v must be at least u — 1 which is almost the same as 
for the naive id-code which requires u tests. 

Definition 2. An index Ai includes Aj if Ai V Aj = Ai. 

Given the matrix A of an id-code. Observe that if there are two columns i j 
such that the index Ai includes Aj, then the code is unable to identify whether 
there are a single bad signature with the syndrome Ai or two bad signatures 
with the syndrome Ai V Aj. In other words, the matrix A with such indices is 
not able to identify bad signatures with indices Aj and Ai . We say that the two 
indices collide. 




( 1 ) 






Codes Identifying Bad Signatures in Batches 147 

Lemma 1. Given identification coding with a {v x u) test-checking matrix A. 
Assume further that there is an index Ai (column Ai) such that its Hamming 
weight wt{Ai) = r, then the number of colliding indices with Ai is C^{Ai) = 
2"-’' - 2. 

Proof. There are two cases where collision may occur 

— the index Ai includes other indices {Ai \/ Ak = Af} for some fc, 

— the index Ai is included in other indices {Ai \J Ak = Ak). 

For a given index Ai with its Hamming weight r, we can create 2’’ — 1 indices 
which are included in Ai - the first case. We can also create 2’'“’’ — 1 indices 
which include Ai - the second case. In effect, we have to exclude 2’’ + 2"“’’ — 2 
indices. 



Corollary 2. To increase effectiveness of identification codes we should select 
weights of indices so the number of colliding indices is minimal. The smallest 
number of colliding indices occurs when the Hamming weight of all indices is | . 

Assume that we have two indices Ai and Aj. We can define the intersection 
of the two as Ai A Aj where A is bit-by-bit logical AND. 

Lemma 2. Given two indices Ai and Aj such that wt{Ai) = ri and wt{Aj) = 
V 2 . Denote A^ = Ai A Aj - the maximal index which is contained in both Ai 
and Aj and wt{Aff = r. Then the number of indices which collide with the pair 
{Ai , Aj ) IS 



C#{A„Aj) = 2"-’'! -h 2’^-’’^ -h 2’'!+’'^-’' 



2^v-\-r—ri—r2 Q^ri — r 2^r2 — r 



Proof. Denote A = {Ai, . . . , A^}. Note that Cjj,{Ai,Aj) > Cjj,{Ai V Aj) and 
becomes the equality only if r = 0. ^From Lemma [TJ we can write 

C#{A, V Aj) = -h - 2. 



Denote ffAi and ffAj the numbers of colliding indices from Ai and Aj, respec- 
tively, which have not been considered among the indices from Ai V Aj. Thus, 
we have 

Cjj^{Ai, Aj) = Cjj:{Ai V Aj) f(Ai -\- fi=Aj. 

There are the following cases, the index 

— collides with Ai - there are 2’'^ such indices, 

— collides with Aj \ - there are 2’’^“'’ such indices, 

— collides with A \ {Ai A Aj) - there are g^ch indices. 

Observe that indices colliding with Aj\Ai have been already counted in Cjf,{Ai\/ 
Aj). Further on, note that the zero index (all bits are zero) has been counted. 
Therefore 



ffAi = (2’'=-’' - - 1) and #Aj = - 1). 



Adding the numbers we obtain the final result. 
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Lemma 3. Given identification code determined by its (v x u) matrix A. If there 
is a parameter k < u and a sequence of indices {Ai ^ , • ■ • , Ai^f) such that 

k 1 

i=i [ 1 

then the id-code can identify no more than k bad signatures. Where Vj=i stands 
for bit-by-bit logical OR and is a binary vector of length v containing ones 
only. 

Proof. Denote A = {Ai, . . . , Au} as the set of all indices (columns) of the matrix 
A. Create the following two sets: Ai = {Ai^, . . . ,Aii^} and A 2 = -4 \ ^ 1 . The 
proof proceeds by contradiction. Assume that any t = k -\- 1 bad signatures 
can be identified. Now we take a sequence of k bad signatures with their indices 
(Aij , . . . , Ai^). Their syndrome is 1^. Now if there is an extra bad signature than 
the collection of t bad signatures have the same syndrome - there is a collision 
and we have obtained the contradiction. 




E Constructions of Id-Codes 

As we know, one would wish to have an identification code which allows for 
gradual increment of t with a possible re-use of all tests conducted for smaller 
ts. Now we present our main construction. 

Definition 3. A (k-\- l)n x matrix A with binary elements is a OR-checking 
matrix if there are k-\-l ones per column, n ones per row, and the inner product 
of any pair of columns is either zero or one. 

Lemma 4. Given a {k-\- l)n x OR-checking matrix A. Then the OR of any 
subset of k columns is unique for k = 1, . . . ,n — 1. 

Proof. For convenience in typesetting we will write these columns as rows by 
transposing the matrix - so we are going to consider A^. We consider any k 
rows but permute them so that the ones are moved to the left of each row as 
far as possible. We now consider a simple counting argument to look at the 
intersection patters of the rows. If any two rows have an intersection -|-1, the 
ones (written as x) will use a total of i(fc -I- l)(fc -I- 2) — 1 columns and be able 



to be represented as: 

k-bl k k-1 ... I 2 

XXX... X 0 0 ... 0 0 ... 0 ... loo 

X00...0 XX... X 0 ... 0 ... loo 

0x0... 0 xO.,.0 X...X ... loo 

000... X 0 0 ... X 0 ... X ... I XX 
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If any pair of rows do not have intersection +1 then more than l)(fe + 

2) — 1 columns will be needed to represent the patters of ones but the last row 
will always have at least 2 elements +1 at the right of the row which have no 
element in the column above either of them which is non-zero. 

Now suppose that the matrix yielded that any fc — 1 rows corresponding to 
bad signatures gave a unique OR but that there are two solutions which give the 
same result for k rows indicating bad signatures. We rearrange the rows in our 
pattern representative, if necessary, so one of these two solutions is the last row. 
We now consider the other solution. For the first k — 1 vectors and the second 
solution to cover the same number of columns the second solution must have two 
-|-1 at the right of the row which have no element in the column above either of 
them non-zero. But this means the first and second solution have at intersection 
at least 2 ones contradicting the definition of the OR-checking matrix. Hence 
any collection of k rows produces OR sums which are distinct. 

We note that this proof does not extend to a collection of fc -I- 1 rows because 
in that case we could only assume the last row to have more than one elements 
-|-1 at the right of the last row which has no element in the column above it 
which is non-zero. This does not lead to any contradiction. 

Corollary 3. Given a {k + l)n x OR-checking matrix A whose every two 
column intersection is either zero or one. Then there is an IC{u,t) code which 
is capable to identify up to t = n—1 bad signatures within a batch of size u = . 

The identification code based on OR-checking matrices is efficient as it allows 
to re-use all previous results if the guess about the parameter t has been wrong. 
Given a batch of the size u = . The (n x u) OR-checking matrix A is 

created. Denote A^l as a shortened version of A containing first {t l)n rows 
of H; t = 1, 2, . . . , n — 1. 

1. The identification process starts from the assumption that t = 1. First col- 
lection of 2n tests T are run for batches defined by rows of the matrix A^l . 
If the bad signatures are not correctly identified (i.e. the batch without bad 
signatures still fails the test T), then it is assumed that t = 2. Otherwise the 
process ends. 

2. Assume that the identification using A^'l has failed to identify bad signatures 
(t = 2, 3, . . . , n — 1). The collection of necessary tests are defined by 

Note that differs from in that it contains n additional rows. The 

identification process can be accomplished by running n additional tests 
corresponding to the batches defined by rows in which are not in 

AA) . If the identification has not been successful, t is increment by 1 and the 
process continues. 

The identification fails if t > n. 

The construction also gives the upper bound on the number v of necessary 
tests to identify t bad signatures. 

Corollary 4. The number v of tests necessary to identify t bad signatures in 
the batch of size u satisfies the following inequality v < {t-\- l)^/u. 
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There are many combinatorial structures which can be used to give the re- 
quired OR-checking matrices for example transversal designs and group divisible 
designs. However we give a rich construction based on Latin squares. 

A Latin square of order n is an n x n array in which n different symbols, say 
a, b, . . . each occur once in each row and column. Two Latin squares are said to 
the mutually orthogonal if when the squares are compared element by element 
each of the distinct pairs occurs exactly once. Formally, two Latin squares, L and 
L' are said to be mutually orthogonal if L{a, b) = L{c, d) and L'(a, b) = L'{c, d), 
implies a = c and b = d. For further information, refer to . 

Lemma 5. Suppose there are k mutually orthogonal Latin squares of order n. 
Then there is a (k + l)n x OR-checking matrix. 

Proof. We use the auxiliary matrices described in |2j . 



Example 1. Let 



abed 




abed 




abed 


bade 
e d a b 


, M2 = 


c d a b 
d c b a 


,Ma = 


d e b a 
bade 


d e b a 




bade 




e d a b 



be three mutually orthogonal Latin squares of order 4 on the symbols x\ = a, 
X 2 = b, X 3 = c and X 4 = d. Define 1 < f < fc, by 



(Mij)ef 



1 {Mi)fj — Xe, 

0 otherwise. 



where 1 < e, / < 4. So M^-, 1 < i < 4 and 1 < j < 4 can be written as 

1111 
1111 
1111 
1111 
~ I 1 T 

11 11 

I 11 1 

1111 

T 1 1 1 

II 11 

1 11 1 

II 11 

T I 1 I 

11 1 1 

III 1 

1 111 



Corollary 5. Let q > 2 be a prime power then there are g — 1 mutually orthog- 
onal Latin squares of order q 

Many other results are also known, for example for every n > S except 6 
there are at least two orthogonal Latin squares of order n and for n > 90 there 
are at least 6. 
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F Hierarchical Identification 

Identification codes are designed to work with a batch of fixed size. In practice, 
one would expect to have an identification scheme which is going to work with 
a batch of arbitrary length. Hierarchical identification provides such a scheme. 
Consider a family of id-codes T = {IC{v, t)} with some well defined parameters 



Definition 4. Given a batch of arbitrary length u. Hierarchical identification 
based on the family of identification codes T is a procedure defined recursively: 

— stopping case - if the size of the batch u is smaller or equal to some parameter 
V so we can use the identification code IC{v,t) € then we apply it (flat 
identification), otherwise 

— recursive step - if the size of the batch u is bigger than the highest parameter 
Vmax in the family T , then it is divided into £ sub-batches such that i < Vmax 
and there is some IC{v,t) G T which can be used to identify contaminated 
sub-batches where £ <v and ft' < t). 

The hierarchical identification is denoted by HI{T). 

Hierarchical identification can be based on different collections of id-codes. 
There are two extreme cases: 

— T consists of infinite sequence of id-codes, 

— the family T is reduced to a single id-code. 

No matter what is the underlying family T , one would ask the following 
questions: 

— What is the minimum (maximum, average) number of tests which is neces- 
sary to identify all bad signatures ? 

— Given a family T and the number t of bad signatures in the batch, is there 
any procedure which minimises the number of tests ? 

F.l Hierarchical Identification with Infinite £F 

Consider id-codes defined in Section H Each id-code can be uniquely indexed 
by a prime power p > 2. For this index, the code is IC(jfl,p — 1). The family 

J- = {IC{p^ ,p — l)|p is the prime power;p 2} 

Note that IC{p^,p — 1) can be used to identify up to p — 1 bad signatures. If 
the number of bad signatures is t < p — 1, the code will use (t -I- l)p -I- 1 tests. If 
t > p — I, then the code fails. 

Let ffT {T) be the number of tests necessary to identify all t bad signatures 
in a batch Now we are trying to evaluate lower and upper bound for the 
number ffT{T). Assume that the size of the batch u = p^ where p is a prime 
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power. Now we choose somehow pi < p and divide the batch into pf sub- 
batches. Each sub-batch contains elements. Note that we have to consider 
only codes for which t > — 1 as otherwise the code may fail. 

Let the t bad signatures be clustered into r sub-batches each containing ti 
bad signatures so t = where r < pi — 1 and naturally, ti < The 

number (iF) has two components: 

1. the number of tests necessary to identify all contaminated sub-batches - this 
takes a= {r + l)p\ + 1, 

2. the number of tests necessary to identify bad signatures within the sub- 
batches. For a given sub-batch, we count the number of necessary tests. First 
we choose a prime power p 2 such that P 2 — ^ the sub-batch contains ti 
bad signatures we need 

Pi = (ti + l)p2 + 1 



tests. 

The number of tests #T (iF) = a + A which after simple transformations 
gives #T(iF) = (r -I- l)pi + P 2 {t + r) + {r + 1). The number depends 

on the random parameter r and grows linearly with r so (iF) is smallest for 
r = 1 when all bad signatures occur in a single sub-batch. #T(iF) takes on the 
maximum for r = t = p\ — 1. So we have the following corollary. 

Corollary 6. Given a hatch with t bad signatures. Hierarchical identification 
with infinite T will consume #T(iF) tests where 

2pi -I- (t-l- l)p2 + 2 < #T{T) <pI + 2pip2+pi - 2p2- 



F.2 Hierarchical Batching with a Single IC{v,t) 

In some applications, one would like to keep the identification procedure as 
simple as possible which is using a single identification code or in other words 
the family IF contains a single element. Again, knowing the number t of bad 
signatures in a batch B'^, one would like to see how the number of necessary 
tests to identify all signatures varies (lower and upper bounds) as a function of 
the u and t. 

Assume that v = p^ and we apply the id-code IC{p‘^,p— 1). Given a batch 
B'^. There are two ways bad signatures can be identified: 

— Serial identification - a batch is divided into sub-batches. For each sub- 
batch, the id-code is used. This is a serial application of flat identification. 

— Hierarchical identification - a batch is divided into v sub-batches and the 
id-code is applied for the sub-batches and identifies the contaminated sub- 
batches. The process is repeated for contaminated sub-batches as many times 
as necessary to identify bad signatures. 
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Consider serial identification. Note that if a batch is clean {t = 0), it 
takes one test to verify it. If the batch is contaminated hy t < p bad signatures, 
the identification will take {t + l)p + t + 1 tests. Assume that a batch ,8“ has 
been divided into R = ^ sub-batches (if u is a multiple of p^) among which 
r sub-batches are dirty and the other R — r are clean. All clean sub-batches 
consume one test each. A dirty sub-batch Bi takes {ti + l)(p -I- 1) tests where 
^(^^1 = t. So the number of tests required to identify bad signatures is 

'll 

-r-\-{p-\- l){t + r) 

pZ 

Note that the number of tests is a random variable which ranges from r = 1 
when all bad signatures happen to be in one sub-batch, to r = t when there are 
t sub-batches each containing a single bad signature. 

Consider the second case of hierarchical identification. To simplify our delib- 
erations, assume that u = p^^ for some integer j. Denote the number 

of tests needed to identify t bad signatures in a batch B^^^ when the id-code 
is applied to the sub-batches each containing p^C-i) signatures. The following 
recursive equation is easy to derive 

r 

= {r+l)p + r + '^^T{j - 

where r is a random variable which indicates the number of contaminated sub- 
batches and ti are numbers of bad signatures in the corresponding contaminated 
sub-batches; i= 1, ... ,r. 



G Conclusions 

The generic class of id-codes has been defined using the test-checking matrix A. 
The (m X v) matrix A determines the necessary tests. The syndrome is the binary 
vector which gives the test results for sub-batches defined by rows of A. The 
syndrome is also equal to bit-by-bit inclusive-OR of indices which correspond to 
bad signatures. We have investigated interaction of indices and found out that 
to maximise the identification capability of an id-code, one would need to choose 
indices of their Hamming weight equal to v/2. 

The main construction of id-codes uses the so-called OR-checking matrix. 
The id-code takes a sequence of signatures and allows to identify up to n — 1 
bad signatures. The nice characteristic of the code is that the number of tests 
can be reduced if the batch contains less than n — 1 bad signatures. To identify a 
single bad signature, it takes 2n tests. Any additional bad signature, adds n ad- 
ditional tests necessary for correct identification. There are many combinatorial 
structures which can be used to design id-codes. We have shown how mutually 
orthogonal Latin squares can be applied to construct id-codes. 

We have not discussed the identification procedure of bad signatures in our 
id-code. The problem is far less complicated than for example in error correcting 
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codes, mainly because the monotonicity of the Hamming weight of the syndrome. 
In other words, indices of bad signatures must be included in the syndrome. The 
implementation of this process can be done by 

— checking all signatures one by one and marking those whose index collides 
with the syndrome, 

— removing all signatures belonging to those sub-batches which have passed 
the test (they identified by zeros in the syndrome). In other words, all bad 
signatures are in the set 

u 

T(Bi)=0 

where Bi is the sub-batch determined by the i-th row of the id-code. 

Id-codes can be used directly to a contaminated batch. We called this flat 
identification. Alternatively, a contaminated batch can be first grouped into sub- 
batches and the id-code is applied to sub-batches and identifies contaminated 
sub-batches. This process can be done many times until bad signatures are iden- 
tified. This is the hierarchical identification. 

There are still many open problems. The obvious one is whether the con- 
struction given in this work is “optimal”, i.e. identification of bad signatures 
consumes the smallest possible number of tests. Hierarchical identification al- 
lows to avoid natural limitations imposed by the size of batch and apply the 
id-code in hand to a batch of arbitrary length. Is there any strategy for grouping 
signatures into sub-batches so the number of necessary tests is minimised ? 
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Abstract. This paper proposes a distributed encryption scheme, where 
any party can “signcrypt” a message and distribute it to a designated 
group and any member in the receiving group can “de-signcrypt” the 
message. We also propose a group signcryption, where, given a desig- 
nated group, any member in the group can signcrypt a message on the 
group’s behalf. A group signcrypted message can be distributed to an- 
other group. The proposed schemes have potential applicability in elec- 
tronic commerce. 



Key Words: Signcryption, Public-key Cryptography. 

A Introduction 

Digital signatures are used to ensure the authenticity of information and its 
sender, whereas the information confidentiality is achieved using encryption 
schemes. Hence to achieve both authenticity and confidentiality both signing 
and encryption techniques are needed. That is, to secure the message, it is first 
signed and then encrypted. The total computational cost therefore includes the 
computational costs for performing digital signatures and encryption. Recently, 
a new scheme referred to as signcryption that combines encryption with signa- 
ture has been proposed [Ij. The main advantage of the signcryption scheme is 
claimed to be the savings in computational cost achieved by combining signa- 
ture and encryption in a specific way. The total computational cost is less than 
the sum of the computational costs of encryption and signature. It is impor- 
tant to note that the signcryption scheme is different from the normal “sign 
then encrypt” method; it forms a special combination of signing and encryption 
procedures. 

This paper proposes a distributed signcryption scheme that can be used for 
distributing a signcrypted message to a designated group. First, we consider a 
distributed encryption scheme that allows any member in the system to encrypt 
a message using the group public key and to send it to members of the group. 
Any member of the group can decrypt the message using his or her private key. 
Each member of the group has a private key that matches with the group public 
key. We assume the existence of a group manager, a member trusted by all the 
members of the group, who is responsible for constructing the group public key 
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and the management of the group. Then the paper describes the distributed 
signcryption scheme, where the signcryption scheme allows any member in the 
system to signcrypt a message and send it to members of a group. Any legal mem- 
ber of the group can de-signcrypt the message. This scheme is computationally 
efficient in the case of multiple receivers by inheriting the property from the orig- 
inal signcryption scheme. We then extend our scheme to a group signcryption 
scheme whereby the scheme enables a member of one group to signcrypt a mes- 
sage on behalf of the group and distribute it to any member in another group. 
This group signcryption scheme preserves the anonymity of the signer/sender of 
the message. The schemes proposed in this paper is computationally efficient in 
a distributed environment. This is because only one signcryption is needed for a 
group of n recipients. 

The rest of this paper is organized as follows. Section 2 gives an overview of 
the basic signcryption scheme and the ElGamal encryption scheme. Section 3 
describes our distributed encryption scheme. Section 4 presents our distributed 
signcryption scheme and section 5 extends our scheme to a group signcryption. 

B Preliminaries 

In this section, we briefly look at the earlier digital signcryption scheme jT] and 
the ElGamal encryption scheme | 2 |. Throughout this paper, we denote by p a 
large prime number, Z* a multiplicative group of order q for q = p — 1, and 
g G Z* a, primitive. 

B.l Digital Signcryption 

Signcryption refers to a cryptographic method that fulfils both the functions 
of secure encryption and digital signature, but with the total computational 
cost smaller than the sum of the computational costs required for signature and 
encryption. 

The basic signcryption scheme is based on the Digital Signature Standard 
(DSS) [ 3 ] with a minor modification that makes the scheme more computation- 
ally efficient. The modified DSS is referred to as SDSS and there are two versions 
of the SDSS. 

(1) SDSSl: The signer chooses a number x G Zq at random and computes: 

r = Ti{g^ mod p\\m) , s = x(r -1- modg. 

where g'^ = Imodp, is the private key of the signer and k = g^modp is 
the encryption key that is used to encrypt m using a symmetric key method 
to obtain the ciphertext c. The signcrypted text consists of (r, s,c). 

The verifier recovers the key by computing 

k = (ysg''ymodp, 

where t/s is the signer’s public key = g^y. k can be used to decrypt the 
ciphertext c to obtain m. 
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(2) SDSS2: The signer chooses a number x at random and computes: 

r = modpljm), s = x(l + modg, k = modp. 

The signcrypted text consists of (r, s,c). 

The verifier recovers the key by computing 

k = {ylgY^odp 

Two alternative schemes that can be used in signcryption are ElGamal’s digital 
signature scheme |4I2J and Schnorr’s digital signature scheme j5]. 

The complete signcryption protocol is derived from the SDSS and is de- 
scribed below. In the protocol below, we denote by Ej- a symmetric encryption 
under key fc, Dk the corresponding decryption. We also denote by 7f(.) a strong 
one-way hash function such as Secure Hash Standard (SHS) [6|. The symmetric 
encryption and decryption algorithm can be an algorithm such as the DES [7]. 
We have also used a keyed one-way hash function denoted by The char- 

acteristic of a keyed hash function lies in the fact that the hashed value can be 
computed only when the key is known and it is computationally infeasible to 
find message mi from m 2 if m 2 = Tife(mi). A is the person who signcrypts the 
message m. The key k is partitioned into fci and ^2 of appropriate length and 
this partitioning procedure is assumed to be public and known to both parties 
A and B. The encryption of the message is done using the key ki. The key k 2 is 
used in the keyed hash function. The signature part s is based on either SDSSl 
or SDSS2. Upon receipt of the signcrypted text (r, s,c), B can recover the key 
k, split it into fci and ^2 and use them to decrypt c and verify r. 

A B 

X G Zq 

k = yf mod p 
Split k into ki and ^2 
r = Hk 2 {m) 

s = x(r -I- Xa)“^ modp (SDSSl) 
s = x(l -I- rXo)~^ modp (SDSS2) 
c = Eki (m) 

fc= (j/a^O'modp (SDSSl) 
k = {yl^gYmodp (SDSS2) 
Split k into k± and ^2 
m = Dki (c) 
r = 



B.2 ElGamal Encryption Scheme 

The security of ElGamal’s encryption scheme [2] is based on the difficulty of 
computing discrete logarithms in a finite field. In ElGamal encryption scheme, 
the recipient is assumed to have a pair of public key and private key {y = 
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modp, x), where g and p are public information and can be shared among a 
group of users. 

To encrypt a message m, the sender first chooses a random number k, such 
that k is relatively prime to p — 1. The sender then computes the following: 

Cl = g^ modp, 

C 2 = TOj/* modp. 

The pair (ci,C 2 ) is the ciphertext. 

The recipient can decrypt (ci,C 2 ) to retrieve m as s/he knows the x. 

m = C 2 /ci mod p 



C Distributed Encryption 

This section describes the distributed encryption scheme. Consider a group with 
a set of members and with a public key referred to as the group public key. 
Each member of the group has a private key that matches with the group public 
key. In distributed encryption, any member in the system (within or outside the 
group) can encrypt a message using the group public key and can send it to 
members of the group. Any member of the group can then decrypt the message 
using his or her private key. 

C.l Construction of A Group 

We assume the existence of a group manager who is trusted by the members of 
the group. The group manager is responsible for constructing the group public 
key and updating group members. 

In order to construct a group, the group manager selects a set of integers, 
Xi €r Zq and computes the coefficients G Zq of the following polyno- 

mial: 

n n 

f{x) = - ^i) (1) 

i=l j=0 

This function has the following property: Define Pi <— g°‘' modp for t = 0, 1, ..., n, 
then we have 

n 

^{xi) = = Imodp (2) 

i=0 

This is because T{xi) = gf^^d and f{xi) = 0 in Zp_i. 

The above property is important for us to construct our system. However, 
it is proved that the polynomials are not secure, even if {ai\i = l,...,n} are 
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kept secret from a potential adversary |8|. In order to make the polynomials, 
{/3i|j = 0 , n}, tamper-proof, we adopt the method given in [8j: 

For the given {oq, ai, ..., a„}, we define a new set {aQ,a^,,..., a'„}, where 
a'o = ao, = a„, a'^ = ... = Define {/3'q, f3[, ..., /3'J ^ 

..., g“"} and Ai = Er=m=i. i^iCtix], then the property (cf. Equation 
@) still holds: 

n 

= 1 (modp), for all xi, I = 1, ..., n. 

i=0 

In order to construct group public key, the group manager picks a num- 
ber 'y Gr Zg for member I at random and computes its inverse 7“^ and pa- 
rameters Pi <— ^Ai mod q. The group public key is defined as an n -I- 1 tuple, 
{/3'i, ...,/3^+J ^ {(3[,...,l5'^,g'^ }. The group manager keeps 7 and all {aj 

secret and gives xi and pi to group member /, who uses xi as her secret key. 

Proposition 1. The group public key tuple, {(i'i\i = 0,...,n-|- 1}, can not be 
illegally modified to add or delete a group member. 

Notice that {g°^'\i = 0, ...,n} are not given in a clear form, i.e. they are hidden 
in {g°'^\i = l,...,n — 1}, and also {Ai\l = l,...,n — 1} remain secret from the 
public including group members. It is impossible to modify the public key such 
that T'{xi) = 0. The detailed proof can be found in Ref. [8]. The above group 
encryption scheme has the following property. 

Proposition 2. A member is said to belong to the group, iff the member can 
prove that s/he knows the secret xi that satisfies J-'{xi) = 1 modp. 

The proof involves the use of non-interactive discrete log proof given in [^. 
Using this method, the prover can show that s/he knows the secret xi without 
releasing the secret to the verifier. In the non-interactive method, the prover and 
the verifier do not need to interact with each other. That is, the verifier does not 
need to send a challenge to the prover as part of the proof process; the challenge 
is actually computed by the prover. We omit the detail of such proof here. 



C.2 Distributed Encryption 

Assume that a member Alice wants to send a message m securely to a desig- 
nated group G. She picks the encryption key, fc', computes k = TL{m), and then 
encrypts m to obtain the ciphertext c = (ci, C2) as follows: 
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Any member in G can decrypt the ciphertext c to obtain m. Let Bob be the 
member 1. Bob does the following: 



c'l ^ aonar'<Vi 

n 

i=0 

= g^' 

= g^ modp 

m = C 2 /C 1 = {mg^')lg^' . 



Note that f{xi) = 0 modg. Once m is obtained, Bob can verify the correctness 
of the encryption by checking whether c\ = {g^ /3'g, /?' P'n+i} using g'^ and 
k = Ti.{m). 



D Distributed Signcryption 



In this section, we present the construction of a signcryption scheme suitable 
for a group of recipients. That is, any member in the system (within or outside 
the group) can signcrypt a message and distribute it to the members of the 
group. We assume that the group is constructed as described in Section 3. The 
encryption is still assumed to be based on the use of symmetric key techniques 
such as DES and the digital signature is assumed to be based on SDSS given 
earlier in Section 2. We will still use the notation Ek to denote symmetric key 
encryption using key k and Dk to denote the corresponding decryption; also, we 
will use a keyed one-way hash function 7i/c(.). 

In the description of the signcryption protocol below, we assume that Alice is 
the sender who signcrypts a message m and sends the message to the designated 
group. Bob is a member of the designated group and Bob needs to de-signcrypt 
and verify the message. Assume that Xa and Xb are private keys of Alice and 
Bob, respectively. To signcrypt a message m, Alice does the following: 

(1) Chooses a number cc € Z, at random, computes k' = g^, splits k' into ki 
and /c 2 as agreed earlier. 

(2) Computes r = Tik^iTn)- 

(3) Computes s = x{rk' +Xa)~^ mod q if SDSSl is used or s = x{k' +Xar)~^ mod q 
if SDSS2 is used. 

(4) Computes k = H{m). The signcrypted text is as follows: 



Cl < {ag , . • . , Cln— 1 ) ^n+l} 

C2 = Ek^{m) 



{/Vo, (SDSSl) 
(SDSS2) 
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(5) Alice sends to Bob or the designated group the signcrypted text (ci, C 2 , r, s). 

Any one of group members can de-signcrypt and verify the signcrypted text 
by discovering the secret key k' . The procedure is as follows: 

(1) Bob recovers k' by computing 
For SDSSl: 



k' ^ (2/aOonar<Vi)" 

n 

= ivag ''’"' where f{xi) =0modq 
= mod p 



For SDSS2: 



k' = {yaaoflaTan+iY 

2^1 

n 

2^0 

= where f{xi) = Omodg 

= modp 

(2) Bob splits k' into ki and fc 2 as agreed earlier and computes m = Dki{c). 

Completeness: It is obvious that the recipient can de-signcrypt the message 
by following the decryption process above. The only way to decrypt is to have 
one of members’ private keys, xi, along with p;, which gives T' (xi) = 0. 
Soundness: The oq is a ciphertext similar to an ElGamal encryption. Alice has 
to use the group public key, otherwise no one in the group can de-signcrypt the 
message. Alice also has to use her private signing key to compute s, since her 
public key is used in the verification of the signcryption. If s does not embed Xa, 
Xa cannot be removed in the verification in order to obtain k' . 

Proposition 3. Any group member is able to verify the signcrypted text and 
obtain the embedded message. 

In fact, it is not possible to exclude any particular member of the group from 
receiving the signcrypted text (and hence the plaintext) that is intended for the 
group. This can only be achieved by removing the member from the group. 

This scheme is computationally efficient considering the distributed environ- 
ment that requires polynomial computations. Actually, it is obvious that when 
sending a message to a group of Af members, only one encryption is required. 
However, Af encryption operations are used the scheme proposed in PP . 
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E Extension 

In this section, we propose a distributed signcryption scheme whereby the system 
members are partitioned into a set of groups and the scheme enables a member of 
one group to signcrypt a message on behalf of the group and send it to another 
member in an other group. This distributed signcryption scheme satisfies the 
following: 

— The anonymity of the member who signcrypts the message is preserved. That 
is, no one except the group manager is able to find out the identity of the 
sender. 

— At the end of the protocol, the members of the receiving group are confident 
that the message has been signcrypted by a member of the sending group. 

— The member of the sending group who signcrypted the message is confident 
that the message can only be read by the members of the receiving group. 



E.l The Protocol 



Consider two designated groups Ga and Gb and assume that Alice belonging to 
group Go wishes to send a signcrypted message m to the group Gb and that Bob 
is one of recipients belonging to group Gb- Let Alice’s personal public key be 
Da = ^’’’“modp, where Xa is her group private key satisfying fa{xa) = 0, where 
/o() is the polynomial function of Go (defined in Section 2). Similarly, Bob has 
Vb = and Xb- The group public keys for Go and Gb are respectively 

{/3'g, ..., /3'„+i} and {/3g, /3(, ..., Both public keys have the same form 

as those given earlier. 

In order to signcrypt the message, Alice needs to do the following: 



— Chooses an integer, x, from Zq. 

— Computes k' ^ g^modp and splits k' into k\ and ^2- 

— Computes k = H{m). 

— Computes the signing commitment, uj <— for (j = 1, ..., n) 

— Computes r = and sj = k{x{^ — ruj) mod q (j = 1, ...,n). 

— The signcryption is then computed as follows: 



Cl 4- 


- {oo, ai, ..., ttn+l} 4 




C2 ^ 


- {oo, a-n+i} ^ {g^ 


- rk' of k Tjf kpa 

PO ’ Pn+1 


C3 ^ 


- Afci(m). 





— Sends the signcryption tuple (ci, C2, C3, r, Sj) to group Gb- (x. A:, fc', fci, ^2) 
remain secret. 

The public key of the sender or Go should be known to the recipient or Gt- 
The verification process includes the following steps: 
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1: Key recovery: The key, k' , can be computed by Bob or another member in 
the group, Gb- 

n n 

k' = [ao n [«0 n 

i=l 3=1 















j > 



i=0 



3=0 



= ( modp) 



(3) 



2: Verification of correctness of oq and oq: This is done by checking oq and oq 
are indeed constructed with /3 'q and /?'§ respectively (Notice that k is now 
known) . 

Once k' is obtained. Bob splits it to ki and ^ 2 , and verifies/computes the 
message. 

Completeness: It is obvious that if Alice follows the signcryption rule. Bob can 
obtain the correct key, k' . 

Soundness: There are two points we intend to make: 

— Alice must be a member of Ga and follow the correct signcryption procedure, 
or Bob will not be able to de-signcrypt the signcryted message. 

— If Bob is no a member of Gb, he will not be able to de-signcrypt the message. 

To de-signcrypt the message. Bob needs to obtain k' by removing /3 'q from oq 
and (3q^ from oq. oq and do are constructed similarly to an ElGamal encryption, 
thus only way to decrypt them is to use group secret keys, in our example, Xb 
and Xa- 



Proposition 4. The above key discovery process is a witness indistinguishable 
verification of signcrypted knowledge of m. 

Proof. We omit the detailed proof here but the intuitive argument is as follows: 

The verification of the knowledge m is based on nr=5/ ~ Imodp. It is not 

possible to find which Xj is used in the signcryption. In particular, all parameters 
used in the protocol are one-time items and are not linked to the signer; hence 
the use of this information cannot help to determine the identity of the signer. 



F Conclusion 

In this paper, we proposed a distributed signcryption scheme that allowed any 
member in the system to signcrypt a message and distribute it to members of 
a designated group. We then extended the distributed signcryption scheme to 
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the group signcryption. In this scheme, the system is partitioned into groups of 
members and this scheme enables a member of one group to signcrypt a message 
on behalf of the group and distribute it securely to members in the other group. 
Any member of the receiving group can verify the signcrypted message. This 
scheme preserved the anonymity of the sender of the message. 

The schemes proposed in this paper are computationally efficient in the dis- 
tributed environment. This is because only one signcryption is needed for a group 
of n recipients. The schemes proposed in this paper further enhance the practical 
applicability of signcryption techniques in electronic commerce. For example, in 
a Pay TV system, the TV programs can be signcrypted and distributed to a 
group of legitimate subscribers. 
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Abstract. Security of ordinary digital signature schemes relies on a 
computational assumption. Fail-stop signature (FSS) schemes provide 
security for a signer against a forger with unlimited computational power 
by enabling the signer to provide a proof of forgery, if it occurs. Signing 
long messages using FSS requires a hash function with provable security 
which results in a slow signature generation process. In this paper, we 
propose a new construction for FSS schemes based on linear authentica- 
tion codes which does not require a hash function and results in a much 
faster signing process at the cost of slower verification process, and longer 
secret key and signature. An important advantage of the scheme is that 
proof of forgery is the same as a traditional FSS and does not rely on 
the properties of the hash functions. 



A Introduction 

Security of an ordinary digital signature relies on a computational assumption, 
that is assuming that there is no efficient algorithm to solve a particular problem. 
This means that if an enemy can solve the underlying hard problem, he can 
successfully forge a signature and there is no way for the signer to prove that 
a forgery has occurred. To provide protection against an enemy with unlimited 
computational power who can always solve the hard underlying problem, fail- 
stop signature (FSS) schemes have been proposed [2019] . Loosely speaking, an 
FSS is a signature scheme augmented by a proof system which allows the signer 
to prove that a forged signature was not generated by him/her. To achieve this 
property, the signature scheme has many secret keys that correspond to the same 
public key and the sender uses a specific one of them. An unbounded enemy 
who has solved the underlying hard problem and knows the set of all secret keys 
cannot determine which secret key is actually used by the sender. In the case 
of a forgery, that is signing a message with a randomly chosen secret key, the 
sender can use his secret key to generate a second signature for the same message 

* This work is in part supported by Australian Research Council Grant Number 
A49703076 
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which will be different with overwhelming probability from the forged one. The 
two signatures on the same message can be used as a proof that the underlying 
computational assumption is broken and the system must be stopped - hence 
the name fail-stop. Thus, FSS schemes offer an information-theoretic security 
for the signer. However security for the receiver is computational and relies on 
the difficulty of the underlying hard problem. FSS schemes in their basic form 
are one time primitives and so the key can be used for signing a single message. 

FSS schemes and their variants have been studied by numerous authors (see, 
for example, |18ll9ll6ll8ll4ll7| b 

Signing long messages 

A commonly used method of signing an arbitrary long message using a tra- 
ditional signature scheme is by employing hash- then- sign method. Using this 
method, the message is first hashed and then the signature scheme is applied to 
the hash value. In FSS a similar method can be used. However as noted in [9] the 
proof of forgery will not be based on showing that the underlying assumption 
of the signature scheme is broken; rather, it will be by showing that a collision 
for the collision-resistant hash function used for hashing is found. This implies 
that to have an acceptable proof of forgery, a hash function which is based on 
a computational assumption must be used. In m hash functions based on dis- 
crete logarithm and factorization assumption are constructed and it is shown 
that they require on average one multiplication for each bit of the message and 
the size of the hash value is equal to the size of the modulus. This means that 
for long messages FSS schemes have a slow signature generation process (for 
example one million multiplications for a one Mega byte file). 

An alternative approach is to have an FSS scheme that can be directly used 
for arbitrary long messages. A recently proposed measure |17] of efficiency of FSS 
is redundancy rate, p, which is defined as the length of the signature divided by 
the length of the message. Using FSS for arbitrary long messages and without 
using hash functions means that FSS with low values for p must be designed. 
The lowest value of p for the known schemes is 1 and is achieved by a scheme 
proposed in m- For other schemes p > 2 and so none of the known schemes 
can be used for direct signing of long messages and the only possible method is 
hash-then-sign. 

Other efficiency measures of FSS schemes are the lengths of the secret key, 
public key and the signature |2]. The most efficient FSS with respect to the first 
three parameters is a discrete logarithm based system due to van Heijst and 
Pedersen m ( or vHP scheme). The scheme in [14| have the same efficiency as 
vHP. 

In this paper we propose a new FSS scheme for which p can be chosen arbitrar- 
ily low and so can be used for direct signing of long messages. This means that 
no hash function is required and so security of the resulting FSS is the same as 
a traditional FSS construction (based on the difficulty of DL in this case). The 
proposed scheme requires much less computation for signing long message com- 
pared to the hash-then-sign method and using provably secure hash functions 
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m- More specifically, the signing process in our scheme is around Kj2 times 
faster than the vHP scheme (using a known provable secure hash functions), 
where K is the length of the message in the vHP scheme (at least 151 bits i)- 
The drawback of our scheme compared to the vHP is that the signature verifi- 
cation process is slower. It also requires larger sizes for the secret key, the public 
key and the signature. Table 3 compares various parameters of the two schemes. 
Faster signing process makes the scheme attractive in environments where the 
client, such as the smart card or mobile phone has limited computational power 
but the host is a powerful computer. The construction follows a general ap- 
proach for constructing FSS scheme from unconditionally secure authentication 
code (A-code) proposed in [13] and uses a special class of A-codes, called linear 
A-codes, in which the set of encoding rules written as vectors over Tq, form a 
vector space. These A-codes are of independent interest because of their linear- 
ity. We give the construction of a linear A-codes using linearized polynomials 
over finite fields that can be used to sign arbitrary length messages. Due to the 
lack of space, some details and proofs are omitted. 



Previous Works 

The first construction of fail-stop signature |20j is a one-time signature scheme 
(similar to 0) and results in bit by bit signing of a message which is very 
impractical. In m an efficient single-recipient FSS to protect clients in an on- 
line payment system, is proposed. The main disadvantage of this system is that 
signature generation is a 3-round protocol between the signer and the recipient 
and so it has high communication cost. The size of the signature is twice the 
length of the message. In m, an efficient FSS that uses the difficulty of the 
discrete logarithm problem as the underlying assumption is presented. This is 
the most efficient scheme with respect to the first three parameters and results 
in a signature which is twice the size of the message. For the rest of this paper, 
we refer to this scheme as vHP scheme. In m, another scheme which is nearly 
as efficient as the vHP scheme is proposed. 

In |9llllJ . a formal definition of FSS schemes is given and a general construc- 
tion using bundling homomorphism is proposed. The construction has provable 
security and all existing FSS with provable security are instances of this con- 
struction. It can be proved that for a system with security level 5 for the signer, 
the signature length and the length of secret key required for signing a single 
message are at least 25—1 and 2{5 — 1), respectively. 

In [ 15 ], an RSA-based FSS scheme is proposed. The construction follows the 
vHP scheme but is less efficient and produces signatures that are four times the 
length of the original message. 

In [13], a general construction of FSS schemes from authentication codes is 
proposed. It is shown that a scheme that fits into the general construction of [S] 
can also be obtained by using this construction. However, it is not known if the 
two general constructions are equivalent. 
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B Preliminaries 

B.l Fail-Stop Signature Schemes 

Similar to an ordinary digital signature scheme, an FSS scheme consists of three 
phases 

1. Key generation: The signer and the center through a two party protocol 
generate a pair of secret key, sk, and public key, pk, and makes pk public. 
This is different from ordinary signature schemes where key generation can 
be performed by the signer individually and without the involvement of other 
parties. 

2. Sign: For a message m, the signer uses the signature algorithm sign to gen- 
erate the signature y = sign{sk,m), and sends the pair {m,y) to the re- 
ceiver (s). 

3. Test: For a message-signature pair, the receiver(s) uses the public key pk and 
a test algorithm test the acceptability of the signature. 

It also includes two more polynomial time algorithms: 

4. Proof: An algorithm for proving a forgery. 

5. Proof-test: An algorithm for verifying that the proof of forgery is valid. 

A secure fail-stop signature scheme must satisfy the following additional prop- 
erties I19I11I9I . 

1 . If the signer signs a message, the recipient must be able to verify the signature 
{correctness) . 

2. A polynomially bounded forger cannot create forged signatures that success- 
fully pass the verification test {recipient’s security). 

3. When a forger with an unlimited computational power succeeds in forging a 
signature that passes the verification test, the presumed signer can construct 
a proof of forgery and convinces a third party that a forgery has occurred 
{signer’s security). 

4. A polynomially bounded signer cannot create a signature that he can later 
prove to be a forgery {non-repudiability) . 

To achieve the above properties, for each public key there exists many match- 
ing secret keys such that different secret keys create different signatures on the 
same message. The real signer knows only one of the secret keys, and can con- 
struct one of the many possible signatures. An enemy with unlimited computing 
power, although can generate all the signatures but does not know which one 
will be generated by the true signer. Thus, it will be possible for the signer to 
provide a proof of forgery by generating a second signature on the message with 
a forged signature, and use the two signatures to show the underlying computa- 
tional assumption of the system is broken, hence proving the forgery. 

An FSS in its basic form is a one-time signature scheme that can only be used 
for signing a single message. However, it is possible to extend an FSS scheme to 
be used for signing multiple messages mm- 
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Security of an FSS is broken if 1) a signer can construct a signature and later 
provide a proof that is forged; or 2) an unbounded forger succeeds in constructing 
a signature that the signer cannot prove that it is forged. These two types of 
forgeries are independent and so two different security parameters, i and <5, 
are used to show the level of security against the two types of attack. More 
specifically, I is the security level of the recipient against the forgery of the 
signer, and 8 is that of the signer against the unbounded forger. It is proved 
that a secure FSS is secure against adaptive chosen plain-text attack and for all 
c > 0 and large enough £, success probability of a polynomially bounded forger 
is bounded by For an FSS with security level 8 for the signer, the success 
probability of an unbounded forger is limited by 2“*^. In this case we simply call 
the scheme (£, 8)-secure. 

B.2 Authentication Codes 

In the conventional model of unconditionally secure authentication systems, 
there are three participants: a transmitter, a receiver and an opponent. The trans- 
mitter wants to send a message to the receiver using a public channel which is 
subject to active attack. The opponent can impersonate the sender by inserting 
a message into the channel, or substitute a transmitted message with another. 
To protect against these attacks, transmitter and receiver use an authentication 
code (A-code). 

A systematic A-code (or A-code without secrecy) is an A-code where the 
codeword (message) generated for a source state (information to be authenti- 
cated) is obtained by concatenation of an authenticator (or a tag) to the source 
state. The code can be specified by a triple {S,T ,£) of finite sets together with 
a (authentication) mapping f : S x £ ^ T. Here S is the set of source states, 
£ is the set of keys and T is the set of authenticators. To send a source state 
s S 5 to the receiver transmitter uses his secret key e G £ that is shared by 
the receiver, to construct a message m = (s,t) where t = /(s, e) G T. When 
the receiver receives the message m = (s, t), she uses her secret key to check the 

authenticity by verifying if t = /(s, e). If equality holds, the message m is valid. 

An opponent may insert a message m' = (s', t') into the channel, without ob- 
serving any previous communication, or substitute a message m = (s, t) sent over 
the channel with another message m' = (s',F). The two attacks are called im- 
personation and substitution, respectively. A message (s, t) is valid if there exists 
a key e such that t = f{s,e). We assume that there is a probability distribution 
on the source states, which is known to all the participants. Given this distribu- 
tion, the receiver and the transmitter will choose a probability distribution for £. 
We will denote the probability of success of the opponent in impersonation and 
substitution attack, by Pj and Ps, respectively. Let P{-) and P{-\-) denote the 
probability and conditional probability distribution of the message space S xT . 
Then we have 

Pi = max P{{s,t) valid) and 

S,t 
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Ps = max max valid | (s,t) observed ). 

s,t s^s' ,t' 

If we further assume that the keys and the source states are uniformly dis- 
tributed, then the deception probabilities can be expressed as 



Pi = max 

S,t 



\{eeS:t=f{s,e)}\ 
|£| 



Ps 



= max max 

S,t S'y^S,t' 



|{eg£:t=/(s,e),t'=/(a',e)}| 

\{e&£-t=f{s,e)}\ 



Since the opponent can choose between the two attacks, the overall deception 
probability of an A-code is defined as Pd = max{P/, Ps}. An A-code is e- secure 
if Pd < e. One of the fundamental results in the theory of A-codes is the square 
root hound |3], which states that Pd > l/y/\£\ and the equality holds only if 
l^l < \/\^ + 1. The square root bound gives a direct relation between the key 
size and the protection that we can expect to obtain. 

A general construction of FSS from A-codes is given in m- The construction 
is for a single-message and uses two families of A-codes A = {(5;^, Tk, £k) ■ K G 
N} and Al = : K G N}, a family of polynomial time collision 

intractable bundling hash function H = {hx : AT G fV} where hx '■ Ex 
and a family of polynomial time collision intractable hash functions TL' = {h'x '■ 
K G N}, where h'j^ : Tx and the property that for any choice of key K, 

and for an arbitrary e G Ex the following is satisfied for all s G Sx- 



if e(s) = t, and hx{e) = e',then e'(s) = t' and h'x{t) = t' 



C Construction of FSS from Linear A-Codes 

In this section, we introduce a new class of A-codes, called linear A-codes, and 
present a construction of FSS schemes by combining linear A-codes and a one- 
way functions fp^g based on the discrete logarithm. 

In the rest of the paper, let p be a prime and Ep be the finite field of order 
p (we may regard Ep as Zp). We also use V{n,q) to denote an n-dimension of 
vector space over a finite field with order q. 

Let (S,T,£) be an authentication code with the authentication mapping 
f : S X £ — > T. To each source state s G 5, we associate a mapping fs from S 
to T defined by /s(e) = /(s, e), Ve G £. Then the family {/^ | s G 5} completely 
specifies the underlying A-code (S,E ,£). For our purpose, we shall require that 
the functions fs have some additional properties, defined as follows. 

Definition 1. An A-code {S,T,£) with the authentication mapping f : S x 
£ — > T is called linear ( over Ep ) if 

1. Both £ and T are linear space over Ep; 

2. For each s G S, fs is Ep-linear from £ to T. 
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In the sequel, we assume that (S,T,S) with authentication mapping / is a 
linear A-code with £ = V{u,p), T = V {v,p), where p is a prime. We assume that 
A is a subset of t 6 by matrices over Tp, and fs can be defined as fs{e) = es = t, 
where e G V{u,p), t G V{v,p). 

To construct our FSS scheme we combine a linear A-code and a one-way 
function based on the discrete logarithm problem and given by, 

fp,g ■■ X ^ mod q 

where p\q — 1. The construction works as follows. 

— Prekey Generation: The center selects primes q and p such that p\q— 1 and a 
cyclic subgroup, Hp, of !F* with order p such that the discrete logarithm over 
Hp is hard. He also chooses two elements g,h G Hp, and publishes (p, q, g, h). 

— Key Generation: The signer chooses a secret key sk which consists of two 
authentication keys of a linear A-code {S, T, £) over Tp. 

That is, sk = (e, e'), e, e' G £, where e = [ei, . . . , e„], e' = [e\, . . . , e'„], Ve^, e' G 
Tp. The corresponding public key pk is g^ Q K , is defined as 

p® 0 /i®' = (modq) 

= [pki, . . . ,pku]. 

— Sign: To sign a message s G S, the signer applies the authentication code 
{S, T, £) to generate two authentication tags t and t' corresponding to the 
key e and e'. That is, the signature for a message s is 

{s, f{s, e),f{s, e')) = (s, es, e's) (modp) 

= (s,t,i') 

= (S, [tl,...,ty],[t\,...,t'J) 



— Test: For a message 

/ Sl,l Sl,2 • • • \ 

S2,l S2,2 • • • S2,v 

S= ... 

^u,2 ' ‘ ' ^u,v / 

{s,t,t') is a valid signed message iff for all 1 < i < u, 

= {pkiY^’'{pk 2 y^'' ■ ■ ■ {pkuY"'' (modg). 

— Proof of Forgery: If there is a forged signature (t, f) on a message s, then 
the presumed signer can produce his own signature on the same message, 
namely {t,t'), and show that these two signatures collide. 



Theorem 1. Let£ be the security parameter of the underlying discrete logarithm 
problem. If the linear A-code {S,T,£) is e-secure, then the above construction 
results in a {£,S)-secure FSS scheme, where 5 = log(l/e). 
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Example 1. Let 5 = T = Tp and S = Tp x Tp. 

We define a function f,Sx£ — > T, by /(s, ( 61 , 62 )) = ei + se 2 - Then it 
is easy to verify that (S,T,£) is a linear A-code over Tp. In terms of matrix 
representation, we can write a source state S' as a 2 x 1 matrix over Tp 

S={s= I wGEp}, 



and the authentication mapping / as f{s, (ci, 62)) “ ^2) J = ei + we 2 - 

The FSS based on this linear A-code is the same as the vHP scheme. 

D Construction of Linear A-Codes 

Although the underlying linear A-code of the construction in example [T|is nearly 
optimal (i.e., nearly meets the square root bound), it requires that the size of 
key to be double the size of the source, i.e., log \£\ = 21og |S|, and so the size of 
the key grows linearly with the size of the source. In the following, we construct 
linear A-codes that do not meet the bound and so are not optimal, but allow 
the size of the source to be much larger than the size of the keys. 

A polynomial of the form 



L(x) = UiX^ 

i=0 

with coefficients in an extension field Epm. of Ep is called a p-polynomial over 
iFpm. If the value of p is fixed once or is clear from the context, it is also called 
a linearized polynomial. It is well-known that if E is an arbitrary extension field 
of J-pm and L{x) is a linearized polynomial over J-pm, then 

L{(3 -I- 7 ) = L{/3) + L( 7 ) for all G E, 

L{c(3) = cL{(3) for all c & Tp and all (3 € E. 

Thus, if T is considered as a vector space over Tp, then the linearized polynomial 
L{x) induces a linear operator on T . 

Next we construct a linear A-code from linearized polynomial. We note that 
linearized polynomials have been also used to construct authentication codes for 
non-trusting parties m and message authentication codes for multiple authen- 
tication |12j . Let p be a prime and assume, 

— S = {Lg{x) = I that is, we let each source state 

s G S correspond to a linearized polynomial over Epr, denoted by Ls{x). We 
use the linearized polynomials up to degree p^~^ , resulting in (p'’)* different 
polynomials and so |5| = p*"^. 

- £ = {( 61 , 62 ) I 6 i, 62 G iFpr-}, and so \£\ = p^*". 
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-T = Tj,.. 

— The authentication mapping /: 5 X f — > Tis defined by /(Ls(a:), (ei, 62 )) = 
6i + Ls{e2)- 



Theorem 2. The above construction results in a linear A-code (S,£,T) over 
J-p with the following parameters 

|5|=p'■^ \£\=p^^, \T\=p^ 

and 

Pl=p-\ Ps=p- 

Combining Theorem |T] and (2J we obtain the following corollary. 

Corollary 1. Letp and q be primes such that I is the required security parameter 
that a polynomially bounded forger cannot break the underlying discrete logarithm 
problem. Then the above linear A-code results in a {£, S)-secure FSS scheme such 
that S = {r — k-\-l) log \p\ . 

E Efficiency Measures of FSS Schemes 

In this section, we compare efficiency of our proposed scheme with the most 
efficient FSS scheme, namely vHP scheme. Firstly, we fix the level of security 
provided by the two schemes, and then we find the sizes of the secret key, the 
public key and the signature. Table 1 gives the results of the comparison of FSS 
schemes when security levels of the receiver and the sender are given by I and 
5, respectively. In this comparison, the first two schemes [iSliyl f first and second 
column of the table) are chosen because they have provable security. The first 
scheme (referred as vHP scheme in this paper) is the most efficient provably 
secure scheme, based on the discrete logarithm problems. The third column is 
an FSS scheme based on RSA jl^. The fourth column is a factorization based 
scheme proposed in Column five corresponds to the scheme from Theorem 
12 with r = k. 

In vHP scheme, given the security parameter (£,5), first K = max{i,S) is 

found and then the prime p is chosen such that logp > K. The value of q is 

chosen such that p\q — 1 and {q — l)/p be upper-bounded by a polynomial in K 
(page 237 and 238 my Since the sizes of p and q can be independently chosen, 
we use K to denote log 2 q. 

For our scheme, given the security parameter (.^,<5), first K — max(f', ^) is 

found, and then the prime p is chosen such that log 2 P > K. We also use the 

notation K to denote the size of q. 

In the factorization scheme of m, the security level of the sender, S, satisfies 
T = p-\- S where r is the bundling degree (which determines the number of secret 
key preimages for a particular public key image) and 2 ^ is the size of the message 
space. Security parameter of the receiver, is determined by the difficulty of 
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factoring the modulus n (where n = pq and p and q are both prime numbers) . 
Now for a given pair of security parameters, (£, S), the size of the modulus Ng is 
determined by £ but determining r requires knowledge of the size of the message 
space. Assume p = log 2 P « log 2 q = N(/2. This means that r = 5 + Ni/2. Now 
the efficiency parameters of the system can be given as shown in the table. In 
particular the size of secret and public keys are 2(r + Ni) and 2Ng respectively. 

In RSA-based FSS scheme m, T is the bundling degree and is defined as 
T = log 2 <l}{n) , and security of the receiver is determined by the difficulty of 
factoring n (where n = pq and p and q are both prime numbers). This means 
that r « log 2 n. To design a system with security parameters {£, (5), first iV^, the 
modulus size that provides security level i for the receiver is determined and 
then K = max(^, iVf). The modulus n is chosen such that log 2 n = K. With this 
choice, the system provides adequate security for sender and receiver. 

In the factorization scheme of EZ], the security level of the sender is log 2 q. 
Security level of the receiver is determined by the difficulty of discrete logarithm 
in Zp and factorization of n. Firstly, Ni which is the modulus size for which 
factorization has difficulty £ is chosen. Then, K = max(^, cr) is calculated. 
Since the size of P can be chosen much greater than log 2 n, we use K to denote 

log2 P- 





vHP HE! 


Fact.[WI 


RSAfT^ 


Fact.\n\ 


Our scheme 


Message Size 


K 


2P 


K 


2K 


r‘‘K 


Length of Secret Key 


4K 


AK + 2S 


4AT 


4AT 


4rK 


Length of Public Key 


2K 


2K 


2K 


2K 


2rk 


Length of Signature 


2K 


2K+S 


4K 


2K 


2rK 


Underlying Security 
Assumption 


DL 


Fact 


Fact 


Fact & DL 


DL 



Table 1. Efficiency Parameters Comparison 



Note that as we have pointed out in Example [T] if r = fc = 1, then our scheme 
coincides with vHP scheme. 

Efficiency with respect to the message-length 

We also need to consider the relative lengths of the message and the signature. 
If the length of the signature and the message are denoted by |y| and lx] respec- 
tively, then p = is a measure of communication efficiency of the scheme. 

As pointed out in Table 1, in vHP scheme messages are of length log 2 P and 
signatures are of length 2 log 2 q. This means that p = 2 and so for every bit 
authenticated message, 2 bits of signature are required. In our scheme, messages 
and signatures are of size logp and 2rlogp, respectively. Hence, p = - which 
is less than or equal to 2 (depending on r). In fact, our scheme is the only FSS 
that allows p to change with parameters of the system. Table 2 summarizes these 
results. 
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vHP 


Fact-Ug 


RSA\1^ 


Fact.\11\ 


Our scheme 


P 2 


> 2 


4 


1 


2/r 



Table 2. Comparison of Communication Efficiency with respect to the 

Message-Length 

Signing long messages 

To sign long messages one can use (i) our proposed FSS, or (ii) hash-then-sign 
approach using one of the existing FSS. In the following we compare two schemes: 
one based on the construction presented in this paper, and the other one using 
vHP scheme and a provably secure hash function proposed in | 2I3| . The hash 
function requires on average one multiplication for each bit of the message and 
the size of the hash value is equal to the size of the modulus. We assume that 
the length of the message is r x kK bits for some integers r and k such that 
r > k, K — log 2 P and K — log 2 q. 





Hash-then-sign approach* 


Our Scheme 


Sign (number of multiplications) 


« rkK + 2 


2rk 


Test (number of multiplications) 


< 2K 


< (2 -|- k)rK 


Length of Secret Key 


AK 


AkK 


Length of Public Key 


2K 


2rk 


Length of Signature 


2K 


2rK 


Underlying Security 


Collision- Resistant 


DL 


Assumption 


Hash Function & DL 





Table 3. Complexity of the two FSS approaches for signing long messages. 



Note: * Hashing uses a provably secure hash functions proposed in II2I3I and 
signing is by using vHP scheme [TSi . 

The table shows that signing using the new construction is Kj2 times faster, 
while verification is approximately rk/2 times slower. For example, to achieve 
adequate security [H], we choose K = 151 bits and K — 1881 bits [Hj. Also to 
simplify the comparison, we assume r = fc. To sign a 1 Mega byte message using 
hash-then sign approach (i.e. using vHP scheme with a secure hash function 
proposed in 0 ), the number of multiplications required for signing and testing 
are 1,065,458 and 302, respectively. However, by using the proposed approach, 
the number of multiplications required for signing and testing are 14,112 and 
1,090,824, respectively. This asymmetry between the amount of computation 
required for signing and verification is useful in applications where signer has 
limited computing power, for example uses a smart card, and the verifier has a 
powerful server, for example is a bank. 

F Conclusion 

We introduced a new class of A-codes called linear A-codes which allow efficient 
signing (FSS) of long messages. An important property of the scheme is that p is a 
design parameter of the system and so by choosing it to be small, arbitrary length 
messages can be directly signed without the need for hashing before signing. 
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Abstract. We apply power analysis on known elliptic curve cryptosys- 
tems, and consider an exact implementation of scalar multiplication on 
elliptic curves for resisting against power attacks. Our proposed algo- 
rithm does not decrease the computational performance compared to the 
conventional scalar multiplication algorithm, whereas previous methods 
did cost the performance or fail to protect against power analysis attacks. 

Keywords: Elliptic Curve Cryptosystem, Power Analysis, Timing At- 
tack, Montgomery-form, Efficient Implementation, Scalar Multiplication 
Algorithm 



A Introduction 

A.l Recent Works for Protecting against Power Attacks on ECC 

The usage of Montgomery- like scalar multiplications on ECC While 
investigating efficient scalar multiplication algorithms on elliptic curves, some 
researchers | LD99ILHOOIOKSOO) have independently observed an advantage of 
Montgomery’s method |Mon87l for preventing timing attacks. 

— Lopez, J. and Dahab,R. ILD99I presented a fast algorithm for computing 
scalar multiplications on elliptic curves defined over GF(2"), which is based 
on Montgomery’s method, and remarked that “Since the complexity of both 
versions of Algorithm 2 does not depend on the number of I’s (or O’s) in the 
binary representation of k, this may help to prevent timing attacks.” 

— Lim and Hwang |LH00| mentioned on Montgomery’s method that “..., and 
the execution time does not depend on the Hamming weight of multipliers, 
which helps to prevent timing attacks.” 

— Okeya, Kurumatani, and Sakurai [OKSOO| proposed elliptic curves cryp- 
tosystems from the Montgomery-form BY^ = J- AX^ J- X secure against 
timing-attacks with the technique of randomized projective coordinates. 



B. Roy and E. Okamoto (Eds.): INDOCRYPT 2000, LNCS 1977, pp. 178- H9(J1 2000. 
(c) Springer- Verlag Berlin Heidelberg 2000 
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Coron’s generalization of DPA to ECC and suggested Counter- 
measures Coron | Cor99| generalized DPA to elliptic curve cryptosystems, 
while the previous power analysis attacks are mainly against DES | DES1| 
or RSA | RSA78| ^ Moreover, Coron |Cor99l suggested three countermeasures 
against DPA on elliptic curve cryptosystems: (1) randomization of the private 
exponent, (2) blinding the point P, (3) randomized projective coordinates. 



A. 2 Our Contributions 

We show that the previous scalar multiplication algorithms |LD99ILHOOIOKSOOj 
are vulnerable against power attacks (if we did a wrong implementation) by find- 
ing the input-dependency on the executing procedure of scalar multiplication. 

We also discuss some weakness of Coron’s countermeasures against DPA. 
This paper claims that the 1st countermeasure by Coron fails to break the secret- 
key’s dependence on the executing procedure, and the Coron’s 2nd countermea- 
sure does not succeed in randomizing the expression of computed/computing 
objects. Though Coron’s 3rd countermeasure looks perfect, we also clarify an 
implicit assumption in the implementation of Coron’s designed multiplication 
algorithm secure against SPA, and remark that SPA could be applicable to this 
Coron’s algorithm if we implemented it wrong. 

Then we consider how to implement the scalar multiplication algorithm for 
resisting against power attacks. A hybrid method, an exact implementation of 
Montgomery algorithm with Coron’s randomized technique, is shown to be se- 
cure against power attacks. 

We further show that our proposed algorithm does not decrease the compu- 
tational performance compared to the conventional scalar multiplication algo- 
rithm. Whereas previous methods did cost the performance and/or fail to pro- 
tect against power analysis attacks. Note that if we carefully implement Coron’s 
method in positive manner, power attacks could be prevented. However, Coron’s 
method still costs the computational performance: even the fastest algorithm by 
Coron achieves less than the half as the performance of the conventional. 

A. 3 Applied Elliptic Curve Cryptosystems without the 
y- Coordinate 

Our proposed algorithm considers only the cc-coordinate of kP, fc-time scalar 
multiplication of the point P on the elliptic curve. However, this is enough 
for applying to some elliptic curve cryptosystems including the encryption 
scheme ECES, the key-establishment scheme ECDH, and the signature gen- 
eration ECDSA-S [IEEEpl363| . We should note that Ohgishi, Sakai and Kasa- 
hara |OSK99| developed an implementation of the verifying algorithm of ECDSA 
without referring the y-coordinate of kP. Thus, our algorithm can be applicable 
also to such a signature scheme. 



^ Some recent works on DPA are against AES candidates |DPA00IMes00| . 
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B Power Attacks 

Power attacks arise from the actual implementation of the cryptosystems, which 
differ from the ideal cryptosystems. There are leakages of information in addi- 
tion to input and output data while the cryptographic devices (e.g. smart card) 
execute cryptographic transactions (e.g. signature, encryption, decryption). An 
attacker may use the leakages for his estimate. 

In 1996, Kocher proposed timing attack |KoclKocM| . Timing attack is one of 
the power attacks in which an attacker uses timing of execution for his estimate 
of the secret key. Recently, Kocher et al. proposed DPA (Differential Power 
Analysis) and SPA (Simple Power Analysis) [K.I.I98IK. 1,199] . DPA is a power 
attack in which an attacker uses power consumption and analyzes such data 
statistically, and SPA is an attack without statistical analysis. Coron |( ;or99| 
generalized DPA to elliptic curve cryptosystems with the following SPA-immune 
scalar multiplication algorithm!! 

Algorithm 1 : SPA-immune algorithm 
INPUT a scalar value d and a point P. 

OUTPUT the scalar multiplication dP. 

1. Q[0] ^ P 

2. For i from / — 2 to 0 do the following: (|d| = 1) 

2.1 Q[0] ^ 2Q[0] 

2.2 Q[1]^Q[0] + P 

2.3 Q[0] ^ Q[d,j 

3. Output (5[0] 

C Power Analysis Attacks on Known ECC-Schemes 
C.l The Point of DPA Attack 

In this subsection, for the purpose of constructing cryptosystems with immunity 
to DPA, we explain characteristics of DPA how an attacker estimates the secret 
key in the attack. The point of this attack is “a difference between executing 
procedures (non-symmetry)” and “an appearance of a predicted special value” . 

First, the executing procedure of typical cryptographic transaction depends 
on the secret key. Consequently, the executing procedure of cryptographic trans- 
action differs from secret key to secret key. If an attacker finds the difference 
of the executing procedure from leakages, he is able to derive the information 
on the secret key. Actually, since it is hard to find the difference of executing 
procedure as it is, he treats it statistically and makes its bias big, and finally he 
finds the difference of the executing procedure. 

Next, if an appearance of some specific value on the cryptographic transac- 
tion depends on the secret key, an attacker is able to detect the secret key by 
whether the value appears on the execution or not. That is, it occurs to typical 



^ We denote |d| for the bit-length of the scalar value d and di for the i-th bit of d. 
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cryptographic transaction that a specific value appears in the middle of the com- 
putation for some secret key, the specific value does not appear for another secret 
key. If it is possible for the attacker to predict the specific value from known val- 
ues beforehand, he is able to detect the secret key from the appearance of it. In 
Coron’s DPA attack to elliptic curve cryptosystems, the predicted specific value 
is the coordinates of the point 4P. The predicted specific value does not mean 
the logical value, but means the expression of the value. For example, ^ and | 
are same logically, but the expressions are different. 

Therefore, for the purpose of resisting power analysis attack such as DPA, 
we have to do the next two things. 

1. Break the secret-key’s dependence and the executing procedure of crypto- 
graphic transaction. 

2. Randomize the expression of computed/computing objects. 

In the case of elliptic curve cryptosystems, the secret key is a scalar value, and 
the calculation of cryptographic transaction is the calculation of scalar multi- 
plication. The case that the executing procedure of cryptographic transaction is 
invariant for any value of secret key is an example of no-dependence. Therefore, 
for resisting to DPA, the computing procedure of scalar multiplication should 
be invariant for any scalar value and should be with randomized expression. 

C.2 Analysis of Montgomery- Variants 

Attack on Lopez-Dahab’s method Lopez, J. and Dahab,R. [LD99] presented 
a fast algorithm for computing scalar multiplications on elliptic curves defined 
over GF(2"), which is based on Montgomery’s method. Their method has two 
versions, one uses the affine-coordinates (referred to as LD2A) and the other 
uses the projective-coordinates (referred to as LD2P). 

Algorithm LD2A has immunity against timing attacks. This is because the 
computation amount does not depend on the value ki'. executing the calculation 
oix + t^ + t and ccj -I- 6/x| (j = 1 or 2) for both fci = 0 and for ki = l. However, 
we show that LD2A is vulnerable against DPA (even against SPA). In Step-4, 
LD2A first computes x + + t, then X 2 + h/x\ in the case oi ki = I, whereas 

LD2A first computes x\ + b/x\, then x + t'^ + t in the case of ki = 0. Therefore, 
the executing procedure of the calculation depends on the value ki 

Algorithm LD2P is immune against timing attacks. This is because, Step-4 of 
LD2P executes the computation of Madd, Mdouble in both case of = 1 and of 
fci = 00 Thus, LD2P does not depend on the value ki, then it has no secret-key’s 

^ The computed order oi x + + t and x'^ + h/x'^ is no problem because the order 

does not effect on the computation-result. Therefore, the problem can be removed 
by computing first x + -\- 1, then Xj -|- fo/Xj, which does not depend on the value 

of ki. 

We should note that Madd must be computed before Mdouble. If we compute first 
Mdouble before Madd, the value (X 2 ,Z 2 ) or (Xi,Zi) could change, therefore the 
output of would be wrong. 
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dependency nor the executing procedure of cryptographic transaction. Hence, 
LD2P is immune against SPA as well. However, LD2P does not use the trick of 
randomizing the expression of computed/computing objects. Therefore, LD2P 
is vulnerable against DPA. 

Attack on the Method by Okeya et al. Okeya, Kurumatani and Sakurai 
already remarked that the scalar multiplication algorithm on the Montgomery- 
form elliptic curves is immune against timing attack [OKSOOj . However, The 
immunity against DPA depends on its implementation, that is, some implemen- 
tation is fragile by DPA. 

The computation of the scalar multiplication on the Montgomery-form el- 
liptic curves requires the computations repeatedly that (2mP, (2m -I- 1)P) or 
((2m -I- 1)P, (2m -I- 2)P) from {mP, (m -I- 1)P) depending on the value of each 
bit of the scalar value | Mon87| . Each computation requires one addition and one 
doubling on the elliptic curve. Since the addition and the doubling are indepen- 
dent, the order of the computation is flexible. Okeya et al. did not mention the 
importance of the order |OKSOO| . For example, assume that we implement the 
algorithm such that “execute addition, and then execute doubling” if the bit is 
0 and “execute doubling, and then execute addition” if it is 1. Then, an attacker 
is able to know the order of the computation, which the addition is first or the 
doubling is first, by DPA (or SPA) attack. Since the case that the addition is 
first is that the bit is 0 and the other case is that the bit is 1, he is able to derive 
the bit. Consequently, this implementation is fragile by DPA (and SPA). 

C.3 Analysis of Coron’s Countermeasures I Cor 991 

Countermeasure 1 : randomization of the private exponent 0 This coun- 
termeasure has the following weakness. There exists some bias of d' which de- 
pends on the scalar value d for i > n, although this countermeasure uses random 
number k. Thus, an attacker is able to estimate d by statistical analysis for those 
values. That is, it does not satisfy 1st requirement. 

We give an example of the attack which pricks the weakness. For simplic- 
ity, we assume |fc| = 2, that is, the possibilities of k are 00,01,10 and 11. For 
simplicity again, we assume that the lower three bits of #P are equal to 001 m 
Then, the possibilities of d' 2 dido given by the following table. 



d2d 


ido 


000 


001 


010 


oil 


100 


101 


no 


111 


dljd'i d'n(fc = 00) 


000 


001 


010 


on 


100 


101 


no 


111 


(fc = 


01) 


001 


010 


oil 


100 


101 


no 


111 


000 


(fc = 


10) 


010 


oil 


100 


101 


no 


111 


000 


001 


(fc = 


11) 


oil 


100 


101 


no 


111 


000 


001 


010 


rate of 


4 = 1 


0 


0.25 


0.5 


0.75 


1 


0.75 


0.5 


0.25 



® The operation of scalar multiplication inside each countermeasure is done by some 
usual scalar multiplication algorithm. 

® It is applicable to any value. 

^ Since is one of the public parameters, the attacker knows its value. 
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Thus, an attacker is able to derive the information about d from that he detects 
the rate of d '2 = 1 by statistical treatment of values of d '2 since the rate varies 
in the value of d. For example, if the rate of d '2 = 1 is 0.5, then he is able to 
estimate that d 2 dido = 010 or 110. 

Since this countermeasure requires the operation of the scalar multiplica- 
tion d' P its inside, its security depends on the inside scalar multiplication algo- 
rithm. Coron assumed implicitly that the inside scalar multiplication algorithm 
of Countermeasure 1 resists SPA for resisting against DPA, because the SPA- 
immune algorithms satisfy 1st requirement. If it does not assume that, this coun- 
termeasure allows such an attack by checking the number or execution order of 
the additions and the doublings on the inside scalar multiplication, namely, the 
computation for d' P. 

Next, it does not satisfy 2nd requirement, either, even if an SPA-immune 
algorithm is used. Because the distributions of the points which appear in the 
middle of the computation have biases, since the distribution of d' has a bias. 

We may use Algorithm 1 as an SPA-immune scalar multiplication algorithm. 
Since |d'| is equal to |d| -1-20, in general, the computation amount of Algorithm 1 
is (|d|-|-19)A-|-(|d|-|-19)T>, and that of Countermeasure 1 is almost same (besides 
the computation amount of random number generation and the computation of 
d' is required), where A and D are the computation amount of addition and 
doubling on the elliptic curve, respectively. If we assume |fc| = |d| for resisting 
the attack above, the computation amount is (2|d| — I)A -|- (2|d| — 1)D. 



Countermeasure 2 : blinding the point P Since Countermeasure 2 includes 
the scalar multiplication d{R + P), the security of this countermeasure depends 
on the algorithm which computes d{R + P)- In this case, the same as Counter- 
measure I, this countermeasure requires an SPA-immune algorithm. Otherwise, 
it is easy for an attacker to detect the scalar value d. All he has to do is to 
check the number or the execution order of elliptic additions and doublings on 
the inside scalar multiplication d{R + P). The scalar value d which is used on 
the computation is invariant, whereas d' of Countermeasure 1 varies. 

We may use Algorithm 1 as an SPA-immune scalar multiplication algorithm. 
For a scalar value of size I, the computation amount of Algorithm 1 is (/ — 1)A-|- 
— Besides, Countermeasure 2 requires the addition R+P, the subtraction 
d{R+P) — Q and two doublings for refreshing R and S. The computation amount 
of Countermeasure 2 is (|d| -I- 1)A + (|d| -|- 1)D, since the computation amount 
of elliptic subtraction is almost same to that of elliptic addition. 

On the randomization of this countermeasure, we set Rq = R and Sq = S 
for the first R and S, respectively, and set Rj and Sj for the R and S of j-times 
execution after, respectively. Then, Rj = (— l)“2-li?o and Sj = (— l)“2-’S'o for 
j P 1) where a = 0 or 1. Thus, once this scalar multiplication is done, the 
possibilities of Rj and Sj does not increase, that is, they remain two possibilities 
each. It is hard to say that it works well as a randomization. Therefore, this 
countermeasure does not satisfy 2nd requirement. 
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We present an attack to Countermeasure 2 with Algorithm 1. The counter- 
measure is vulnerable to this attack. Let P, 2P, 4P, • • • , be points on the 
elliptic curve, and Cj (t) be a function of power consumption associated with the 
execution of d{2^P). First, an attacker feeds these points to the cryptographic 
device which is equipped with Countermeasure 2. Then, he gets the functions 
Cj{t) and calculates the correlation function . That is. 



k-l 






imn 






(Q(t + to)-Q+i(t)) = 



, MAXVAL 



where <o is the time required for each round El and MAXVAL is some big 
number as a threshold in case the function take infinity. He distinguishes time 
ti such that g{t\) does not vanish, and calculates the number ni which satisfies 
(ni — l)to < < TT-ito- Then he finds di-i-ni = 0. 

The following is the reason why he concludes the bit is true. First, we assume 
that three significant bits of the scalar value are equal to 100, 

namely, d = (100 ■ ■) 2 - In this case, first round at (j -I- l)-th execution is the 

exactly same computation as second round at j-th execution if the random-bit 
b is equal to 0. Thus, the functions of power consumption on the interval are 
(ideally) identical. On the other hand, the functions of power consumption for 
random-bit 6=1 are not identical. Since it happens with probability 1/2, g(t) 
takes [1/2)MAXVAL on the interval if k is sufficiently large. 

Next, we assume that d = (101 * * • • • ) 2 - According to Algorithm 1, the 
computation for the second round at j-th execution, namely, the calculation 
from 2 * 2-^“^(P + R) to 5 * 2-^“^(P -|- P), and the first round at {j + l)-th 
execution, namely, the calculation from 2^ (P-|- P) to 2 * 2-1 (P -|- P), are done as 
follows : 

J-th execution : 2 * 2-J“^(P -|- P) — > compute 4 * 2^“^(P -|- P) 

— > compute 5 * 2-’“ ^ (P -I- P) — > choose 5 * 2-’“ ^ (P -|- P) 

(j -I- l)-th execution : 2^{P + R) — s- compute 2 * 2^{P + R) 

— > compute 3 * 2^ (P -|- P) — > choose 2 * 2^ (P -|- P) 

Thus, the computing procedures are identical on the first half of these rounds. 
Hence, in this case, g{t) takes {1/2)MAXV AL on the first half of the interval. 

Assume that d= (110**---)2. The computation for the second round at j-th 
execution, namely, the calculation from 3 * 2^~ ^ (P -|- P) to 6 * 2^~ ^ (P -|- P), and 
the first round at (j -I- l)-th execution, namely, the calculation from 2^(P + R) 
to 3 * 2^ {P + R), are done as follows : 

j-th execution : 3 * 2^~^{P + R) — > compute 6 * 2^~^{P + R) 

— > compute 7 * 2-J“^(P -|- P) — > choose 6 * 2-J“^(P -|- P) 

(j -I- l)-th execution : 2-^(P -|- P) — > compute 2 * 2^{P + R) 

— > compute 3 * 2^ (P -I- P) — > choose 3 * 2^ (P -I- P) 

® The following g'{t) might be better than g{t): g'{t) = exp{Cj{t + to) — 

® The time required for each round should be constant. Otherwise, the algorithm is 
subject to timing attacks. 
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Thus, the computing procedures are not identical on the last half (and the first 
half) of these rounds, although 6 * 2^~^{P + R) is chosen on both cases. The 
correlation function g{t) should tend to 0 for sufficiently large k in this case. 

Finally, we assume that d = (111 * The correlation function g{t) 

should also tend to 0 in this case. 

Therefore, a specific bit in the scalar value is detected being equal to 1 by 
vanishing of g{t) or being equal to 0 by not vanishing of g{t). Consequently, the 
attacker is able to detect any bit of the scalar value. 

Remark 1. This attack stands on the circumstance that the attacker is able to 
bring some specific point on some execution in another execution. This is because 
that the number of the possibilities of R after many time execution is small and 
that the number of the candidates for next R is only two. Thus, in order to avoid 
the attack, the refreshing procedure of R and S on Countermeasure 2 should be 
modified to the following. 

Algorithm 2 : algorithm to refresh R and S 

INPUT a secret random point R and its scalar multiplication S 
OUTPUT the refreshed R and S. 

1. Generate a random number k of size n bits, (for example, n = 

2. i? <— kR and S ^ kS 

3. Output R and S. 

Here, computation kR and kS shall be implemented for being immunity against 
SPA such as Algorithm 1. 



= dR. 
■- 20 ) 



Countermeasure 3 : randomized projective coordinates Countermea- 
sure 3 requires an SPA-immune algorithm for the inside scalar multiplication, 
the same as Countermeasure 2. This countermeasure satisfies 2nd requirement 
well, since most values in the middle of computation are the points on the el- 
liptic curve, and with randomized expression. Besides the computation amount 
of Algorithm 1, Countermeasure 3 requires the computation amount of random 
number generation and three multiplications on the finite field for the random- 
ization of the point P. Thus, the computation amount of Countermeasure 3 is 
(|(i| — 1)A -I- (|(i| — 1)1? -I- 3M -I- i?, where M and R are the computation amount 
of multiplication on the finite field and random number generation, respectively. 
This countermeasure is the fastest in the Coron’s countermeasures, however, it 
achieves less than the half as the performance of the conventional, which the 
algorithm in Jacobian coordinates with window method ICMU98I or on the 
Montgomery- form elliptic curves |Mon87| . 

D Our Proposed Implementation 

The scalar multiplication algorithm on the Montgomery-form elliptic curve needs 
one addition and one doubling on the elliptic curve, not depending on the specific 
bit whether it is 0 or it is 1, per bit of the scalar value jOKSOOJ . For the sake of 
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invariant computing procedure for any scalar value, the execution order of ad- 
dition and doubling should be fixed, not depending on the specific bit whether 
it is 0 or it is 1. That is, we should choose “execute addition, and then execute 
doubling” or “execute doubling, and then execute addition”. Coron’s Counter- 
measure 3 works well on 2nd requirement | Cor99| (and Okeya et al. mentioned 
it, too lOKSOOn . Thus, we should adjust it for Montgomery-form. 

A scalar multiplication algorithm with immunity to power analysis attack 
such as DPA using randomized projective coordinates and chosen “execute ad- 
dition, and then execute doubling”^ is the following 

Algorithm 3 : fast scalar multiplication algorithm against DPA 

INPUT a scalar value d and a point P = {x,y). 

OUTPUT the scalar multiplication dP. 

1. Generate a random number k. 

2. Express the point P = (kx,ky,k) using projective coordinates. 

3. 1 <— |d| — 1 

4. Calculate the point 2P from the point P. 

5. m <— 1 

6. If i is equal to 0 then go to 15. Otherwise go to 7. 

7. i <— j — 1 

8. If di is equal to 0 then go to 9. If it is equal to 1 then go to 12. 

9. Calculate the point {2m + 1)P. That is, add the point mP and the 
point (m -|- 1)P. 

10. Calculate the point 2mP. That is, double the point mP. 

11. Substitute m for 2m, and go to 6. 

12. Calculate the point (2m -I- 1)P. That is, add the point mP and the 
point (m -|- 1)P. 

13. Calculate the point (2m -I- 2)P. That is, double the point {m + 1)P. 

14. Substitute m for 2m -I- 1, and go to 6. 

15. Output the point mP as the scalar multiplication dP. 

Here, “Substitute m for 2m” (resp. “Substitute m for 2m -|- 1”) means 
that the pair of points (mP, (m -I- 1)P) is substituted by the pair of points 
(2mP, (2m -I- 1)P) (resp. ((2m -I- 1)P, (2m -I- 2)P)) and m is substituted by 2m 
(resp. 2m -I- 1). The operations on the elliptic curves should be done on the 
projective coordinates. 



We may choose “execute doubling, and then execute addition”, and construct the 
scalar multiplication algorithm. 

The proposed scalar multiplication algorithm is applicable not only to Montgomery- 
form elliptic curves but also to general elliptic curves (including elliptic curves defined 
over a finite field with characteristic 2, and elliptic curves defined over an optimal 
extension field |BP98 |i which have Montgomery-like scalar multiplication. 
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D.l Analysis of the Security 

We describe that our proposed scalar multiplication algorithm satisfies 1st and 
2nd requirements in 3.1 in order to show that our proposed algorithm resists 
against power analysis attack such as DPA. 

Requirement 1 : On our proposed algorithm, di is judged at Step 8, and the 
following calculation is divided by its value. Step9, Step 10 and Step 11 are 
executed in order and return to Step 6 if = 0. Step 12, Step 13 and Step 14 
are executed in order and return to Step 6 if dj = 1. In any case, the addition 
on the elliptic curve is executed first, and the doubling on the elliptic curve 
is executed the next. Thus, the executing order of the operations on the 
elliptic curve in our algorithm is invariant. Therefore, the scalar value and 
the executing order have no dependence, and 1st requirement is satisfied. 
Requiremeut 2 : On our proposed algorithm, random number k is generated 
at Step 1, and the point P is expressed in the projective coordinates using 
random number k at Step 2. The operations on the elliptic curve on the fol- 
lowing steps are computed in the projective coordinates. Since the expression 
of the point P is randomized from the first, an attacker is able to predict a 
logical value in the middle of the computation but he is not able to predict 
its expression. Therefore, it is impossible for him to find such a value whose 
appearance specifies the scalar value, and 2nd requirement is satisfied. 



E Analysis of the Efficiency 

In this section, we estimate the computation amount of our proposed scalar 
multiplication algorithm, and show that no other algorithm which resists DPA 
is faster than ours, and our proposed algorithm is by no means inferior on the 
speed to other fast scalar multiplication algorithms |CM098IMon87j . 

E.l Performance of Our Proposed Implementation 

The bit di is judged at Step 8, then our proposed algorithm executes one addition 
and one doubling, not depending whether the bit is equal to 0 or it is equal 
to 1. The repeat time is |d| — 1. It requires one doubling at Step 4 and the 
computation amount of random number generation at Step 1. The expression on 
the projective coordinates at Step 2 requires one multiplication on the finite field 
since the following computation does not require the F-coordinate. Thus, the 
computation amount of our proposed algorithm is {\d\ — 1)A^ + \d\D^ + M + R, 
where and are the computation amount of addition and doubling on the 
Montgomery-form elliptic curve, respectively. Besides, it requires one division on 
the finite field for output with Affine coordinates. 

Addition in the Montgomery-form elliptic curve requires four multiplications 
and two squarings on the finite field |Mon87j . since we may not assume Zi = 1 
for the randomization of the point P at Step 2. Thus, the computation amount 
of our proposed algorithm is (7|d| — 3)M -|- (4|d| — 2)5' -|- R because of A^^ = 
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4M+2S, = 3M+2S. Assume that S = 0.8M. Then, it is (10.2|d|-4.6)M+i?. 

The computation amount per bit is around 10. 2M. 

E.2 Comparison to Other Algorithms 

Coron’s Countermeasure 3 The computation amount of Coron’s Counter- 
measure 3 with Algorithm 1 is (|d| — 1)A-|- (|(i| — 1)D + 3M + R. The computation 
amount of addition and doubling on the Weierstrass-form elliptic curve with 
Afhne coordinates is A = 2M + S + I and D — 2M + S + I, respectively, where 
I is the computation amount of inversion on the finite field. Since I is estimated 
that 15M < I < 40M, in general, the computation amount per bit is around 
40M[3 That with projective coordinates is A = 12M -|- 2S and D = 7M + 5S. 
The computation amount per bit is around 24. 6M. That with Jacobian coordi- 
nates is A = 12M -I- 45' and D = AM + 65. The computation amount per bit is 
around 2AM . Therefore, the computation amount of our proposed algorithm is 
the smallest than that of these. 



Some fast algorithms We compare our proposed algorithm to some of the 
fastest algorithms. The scalar multiplication algorithm with window methods in 
Jacobian coordinates is one of the fastest algorithms |CM098j . The computation 
amount per bit is estimated around lOM. The computation amount per bit of 
the algorithm on the Montgomery-form elliptic curve without randomized pro- 
jective coordinates is around 9.2M [Mon87IOKSOOJ . Compared with the fastest 
algorithms, our proposed algorithm is by no means inferior on the speed since 
the computation amount per bit is 10. 2M. 



Algorithm LD2A We consider the computational performance of Algorithm 
LD2A. M2, 52 and I 2 respectively denote GF(2")-field operations of multipli- 
cation, squaring, and inversion. Step-4 of Algorithm LD2A requires operations 
that (1)M2 -I- 52 -I- /2 for computing t, (2)52 for computing x + + t, and 

(3)M2 -I- 52 -I- 12 for computing A-b/x'^. We may assume that the computation- 
time to GF(2")-field operations of squaring can be ignored. This can be done by 
cyclic shift with the normal basis. I 2 , GT'(2")-field operation of inversion requires 
(at least) 7 times of GF(2")-field operation of multiplication, if n > 128. Thus, 
the computational amount per bit of LD2A is estimated to be I6M2 | LD99| . 

Algorithm LD2P A2, the computation amount of Madd is A2 = 4M2 -I- S 2 , 
and D 2 , the computation amount of M double is £>2 = 2M2-I-452. With ignoring 
S 2 , we estimate the computational amount per bit of LD2P to be 6M2 [LD99j . 



A table of our comparison The following table shows our comparison on the 
performance of each method. 



12 



Moreover, the effect of randomized projective coordinates vanishes. 
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Table 1. Computation amount and immunity 



Type 


Comp, amount /bit 


Timing Attack SPA DPA 


[ED99| LD2A 
ILD99I LD2P 
|()KS00| Mon 


16 M 2 /bit 
6 M 2 /bit 
9.2 M/bit 
10.2 M/bit 


Immune Dependent Vulnerable 

Immune Immune Vulnerable 

Immune Dependent Vulnerable 

Immune Dependent Dependent 


lOKSOOl Mon-tRPC 


ICor99l C3P 
|Cor99| C3J 
proposed algorithm 


24.6 M/bit 
24 M/bit 
10.2 M/bit 


Immune Immune Immune 

Immune Immune Immune 

Immune Immune Immune 


IMon87l Mon 
|CM098| CMO 


9.2 M/bit 
10 M/bit 


Immune Dependent Vulnerable 

Vulnerable Vulnerable Vulnerable 



“Dependent” means that the immunity (or vulnerability) of the algorithm depends on 

the implementation. 
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Abstract. We present a very efficient algorithm which given a negative 
integer A, A = 1 mod 8, A not divisible by 3, finds a prime number p and 
a cryptographically strong elliptic curve E over the prime field IFp whose 
endomorphism ring is the quadratic order O of discriminant A. Our 
algorithm bases on a variant of the complex multiplication method using 
Weber functions. We depict our very efficient method to hnd suitable 
primes for this method. Furthermore, we show that our algorithm is 
feasible in reasonable time even for orders O whose class number is in 
the range 200 up to 1000. 

Keywords: class field theory, complex multiplication, cryptography, el- 
liptic curve, factoring polynomials, hnite field 



A Introduction 

Elliptic curve cryptography is a very efficient basic technology for public key 
infrastructures. An important computational problem in elliptic curve cryptog- 
raphy is the construction of cryptographically strong curves over finite prime 
fields. One approach is to randomly generate a curve, determine its cardinality 
by point counting, and then check whether it is cryptographically strong |MP97| . 
Unfortunately, point counting over large prime fields is rather slow. 

Another method uses complex multiplication. It first searches for the car- 
dinality of a cryptographically strong curve and it then constructs a curve of 
that cardinality. If the endomorphism ring of the curve has small class num- 
ber, this is faster than searching for a strong curve by point counting. However, 
the German National Security Agency (BSI), for example, considers small class 
numbers of the endomorphism ring as a possible security risk (see [BSIOO] ) since 
there are only very few. The BSI recommends using curves whose endomorphism 
ring is an order in an imaginary quadratic number field of discriminant > 200. 
For such curves, the standard complex multiplication method |AM98j is rather 
inefficient since the curve is constructed using a polynomial whose degree is the 
class number and which has extremely large coefficients. 

In this paper we present a variant of the complex multiplication construction. 
The main theory of this variant can be found in [LZ94J or appendix A.13-A.14 of 
the IEEE PI363 standard [lEEEj . However, none of these contributions describe 
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how to find suitable prime fields efficiently in the scope of generating crypto- 
graphically strong elliptic curves. We develop a very efficient algorithm to find 
a suitable prime field for this variant. In practice, all generated primes will be 
different. 

As mentioned above, the main reproach to the standard complex multipli- 
cation method is that it is restricted to a small set of elliptic curves whose 
endomorphism ring has a small class number. The practical results of our imple- 
mentation show that our algorithm is not restricted to this set of elliptic curves. 
For example, our algorithm constructs curves with endomorphism ring of class 
number 200 in 33 seconds on a SUN UltraSPARC-IIi. For class number 500 the 
algorithm requires less than two minutes, and for class number 1000 it takes 
about 7 minutes. The algorithm uses very little storage. Thus it can be used on 
any personal computer. This means that the users of a public key infrastructure 
can generate their own unique cryptographically strong elliptic curve. 

Input of our algorithm is the discriminant A of an imaginary quadratic order 
O. First, the algorithm searches for a prime number p which is the norm of an 
element of O, and such that there exists a cryptographically strong curve over 
the prime field Fp. We show how to make this search fast. Once p is found, the 
algorithm calculates a zero mod p of a polynomial suggested by Yui and Zagier 
fr7m\ . also used in [LZ94j which has much smaller coefficients than the polyno- 
mial from |AM93j . Using that zero, it is easy to construct the cryptographically 
strong curve and a point of large prime order on that curve. 

The paper is organized as follows: In Section [B] we define cryptographically 
strong elliptic curves over prime fields. In Section |C] we present our algorithm. 
Finally we give examples and running times in Section [CT 

B Cryptographically Strong Elliptic Curves 

In this section we review a few basic facts concerning elliptic curves over finite 
fields and define cryptographically strong curves. 

Let p be a prime number, p > 3. An elliptic curve over the prime field Fp of 
characteristic p is a pair E = (a, b) S F^ with 4a^ -I- 276^ yf 0. A point on E is 
a solution {x, y) G F^ of + ax + b or the point at infinity O obtained by 

considering the projective closure of this equation. The set of points on E over 
Fp is denoted by U(Fp). It carries a group structure with the point at infinity 
acting as the identity element. 

We call the elliptic curve E cryptographically strong if it satisfies the follow- 
ing conditions which make the cryptosystems, in which E is used, secure and 
efficient. 

We first consider security. If the elliptic curve E is used in a cryptosystem, 
the security of this cryptosystem is based on the intractability of the discrete 
logarithm problem in the group of points E(Wp). Several discrete logarithm 
algorithms are known. To make their application impossible, we require that E 
satisfies the following conditions. 

1. We have |U(Fp)| = c ■ I with a prime I > 2^®° and a positive integer c. 
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2. The primes I and p are different. 

3. The order of p in the multiplicative group of F; is at least 

The first condition excludes the application of discrete logarithm algorithms 
whose running time is roughly the square root of the largest prime factor of 
the group order (see for example |v()M99| l. The second condition makes the 
anomalous curve attack impossible (see | ISma,Q9j V The last condition excludes 
the attack of Menezes, Okamoto, and Vanstone |MOV91 j which reduces the 
discrete logarithm problem in if(Fp) to the discrete logarithm problem in a 
finite extension field of Fp. The degree of this extension over Fp is at least 
the order of p in F^^ . The third condition is based on the assumption that 
the discrete logarithm problem in a finite field, whose cardinality is a 2000-bit 
number, is intractable. We remark that the German National Security Agency 
|BSI00| requires the order of p in F(^ to be at least 10"^. 

Let us now consider efficiency. Suppose that an elliptic curve E over a prime 
field Fp satisfies the security conditions. If this curve is used in a cryptosystem, 
the efficiency of this system depends on the efficiency of the arithmetic in Fp. 
So p should be as small as possible. It follows from a theorem of Hasse that 

(^|A(Fp)|-l)2<p<(^|A(Fp)| + l)2 . (1) 

Hence, we try to make |i?(Fp)| as small as possible. Now the first security 
condition implies 

|A(Fp)| = c-Z (2) 

with a prime number I > 2^^^ and a positive integer c, the so called co factor. The 
security of the cryptosystem, in which E is used, is based on the intractability 
of the discrete logarithm problem in the subgroup of order I in A(Fp). This 
security is independent of c. Therefore, c can be as small as possible. In our 
algorithm c = 1 is not possible. But we require the following condition which 
implies the first security condition. 

4. We have |i?(Fp)| = c - 1 with a prime number I > 2^®° and a positive integer 
c < 4. 

We say that an elliptic curve E over a finite prime field Fp is cryptographically 
strong if it satisfies the four conditions from above. 

We explain an additional security condition required by the German National 
Security Agency BSI f lBSIOOI l. The third condition implies that the endomor- 
phism ring End(A(Fp)) of the elliptic curve over the algebraic closure of Fp is 
an imaginary quadratic order. The BSI requires the following. 

5. The class number of the maximal order which contains End(A(Fp)) is at 
least 200. 

The reason for this condition is that among all curves over a prime field only 
very few have endomorphism rings with small class numbers. So those curves 
may be subject to specific attacks. However, no such attacks are known. 
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C The Algorithm 

Let Z\ be a negative integer, Z\ = 0, 1 mod 4 and let O be the imaginary quadratic 
order of discriminant A. This order is denoted by Oa- Also, let p be a prime 
number which is the norm of an element in O that is, there are positive integers 
t, y such that 

- Z\y2 ^ _ (3) 

Using complex multiplication, elliptic curves Ei^p and E 2 ^p over IFp with endo- 
morphism ring O and 



|Ai^p(IFp)| — p -|- 1 — t, |£'2,p(IFp)| — p + 1 + t (4) 

can be constructed as follows (see IAM93I . IBSS99I 1. 

Let El e EZ\X\ be the the minimal polynomial of ) where j = 12^ J 

with Klein’s modular function J (see |Apo90| ). Modulo p the polynomial H 
splits into linear factors. Let jp be a zero of H mod p that is, jp is an integer 
such that H{jp) = 0 modp. We assume A < —4 in what follows. Then we have 
jp ^ {0; 1728}. Let Sp be a quadratic nonresidue mod p. With 



we have 

{El p, ii/2,p} = {(®p? ^p): (®pSpj ^pSp)}- (6) 

After this construction it is not known which of the curves is E\_p and which is 
E 2 ^p. However by choosing points on each curve and testing whether their order 
is a divisor of p -I- 1 J- t or p J- 1 — t, the curves E\^p and E 2 ^p can be identified. 

We can decide whether one of the curves Ei^p or i? 2 ,p is cryptographically 
strong before we actually construct those curves. We only need to know the 
prime number p and its representation ©• Then we know the orders of E\^p 
and i? 2 ,p from dD- Using those orders we can check whether one of the curves is 
cryptographically strong because the conditions from the previous section only 
depend on p and the order of the curve. 

Our algorithm cryptoCurve(Z\, v, w) for computing cryptographically strong 
curves proceeds as follows. Input is a negative integer Z\ with Z\ = 1 mod 8, 
Z\ ^ 0 mod 3, and positive integers v, w with w > v > 2^®^. It uses the algorithm 
f indPr ime(Z\, w, w) explained below which determines a prime number p G ['c, w] 
with a representation ([H]) such that Ei^p or i? 2 ,p is cryptographically strong. It 
also returns the prime number I from with I > 2^®°. Once p is found, the 
algorithm f indCurve(Z\, p, 1) constructs E\^p and and returns one of those 
curves which is cryptographically strong. findCurve also returns a point P of 
order I on the curve. We only use discriminants Z\ = 1 mod 8, Z\ ^ 0 mod 3 
because for those discriminants f indCurve can be implemented very efficiently. 
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cryptoCurve(Zi, v, w) 

Input: A G ^< 0 , A = 1 mod 8, Zi ^ 0 mod 3, v,w G 7Z, w > v > 2^®^. 

Output: A prime p G [w,ui]. 

A cryptographically strong elliptic curve E over IFp with endomorphism ring Oa- 
A prime divisor I of |-E(lFp)| with I > 2^^®° such that |i5(IFp)| = 4L 
A point P in -E(IFp) of order 1. 

{p,l) <— f indPrime(Zi, u, ui); 

(E,P) ^ f indCurve(A,p, Z); 
return (p, E, I, P); 



By Lemma H] the order of a curve with endomorphism ring Oa is then divisible 
by 4. 

In order for the algorithm f indPrime(Z\, v, w) to be successful it is necessary 
that the interval [v,w] is sufficiently large. We use intervals of length 2^^^. 

To satisfy condition of the previous section, we choose Z\ such that Oa is 
a maximal order and the class number of Oa is at least 200. Also, if we want 
to generate many different curves for a fixed zi, then we choose many pairwise 
disjoint intervals [r’jw]. 

Next, we explain our algorithm f indPrime(Z\, w). So let A,v,w be as de- 
fined in cryptoCurve(zi, u, rc). We have to find positive integers t and y such 
that p = — Ay‘^)lA: is a prime number, p G [u,ui], and (p + 1 — t) /4 or 

(p + 1 +t)/4 is a prime Z > 2^®°. As the next Lemma shows, those requirements 
imply congruence conditions for t and y. 

Lemma 1. If A = 1 mod 8 and t,y are positive integers such that p = {t^ — 
Ay^)/4 is a prime, then {t mod 4, y mod 4) G {(0, 2), (2, 0)} and p + 1 + t = 
p + 1 — t = 0 mod 4. Also, if {p + 1 + t)/4 is prime, then ft mod 8,y mod 
8) G {(2, 0), (6, 4)} and if{p+l — t)/4 is prime, then (t mod 8, j/ mod 8) G 
{(2,4), (6,0)}. 

Proof. Since A is odd, we have that — Ay“^ = 0 mod 4 if and only if t = y = 
0 mod 2 or t = y = 1 mod 2. Let t and y both be odd. Then = 1 mod 8 

and t^ — Ay"^ = 1 — 1 • 1 = 0 mod 8. Therefore, ft^ — Ay^')j4 is even. Now 
let t and y both be even. If t = y = 0 mod 4, then t^ — Ay^ = 0 mod 16, 
and again {f^ — Ay^')j4 is even. The same is true if t = y = 2 mod 4 since we 
have f^ = y^ = 4 mod 8 and t^ — Ay^ = 0 mod 8. A necessary condition for 
{t^ — Ay "^) /4 to be an odd prime is therefore {t mod 4, y mod 4) G {(0, 2), (2, 0)}. 
Then it is easy to see that p+l+t = p+l—t = 0 mod 4. As both t and y are even, 
there are integers to, Vo with t = 2to and y = 2yo- Let (t mod 4, y mod 4) = (0, 2). 
Then we have to = 2ti with an integer ti and yQ = I mod 8. It follows that 
p-|-l±t = tQ — Z\yg -P 1 ±2to = 4t^ — 1 -P 1 ±4ti = 4ti(ti ± 1) = 0 mod 8. Hence, 
neither (p-P 1 -P t)/4 nor (p-P 1 — 1)/4 can be prime. Now let (t mod 4, y mod 4) = 
(2, 0). Furthermore, assume that yo = 0 mod 4. Then we have Po = 0 mod 8 and 
p -P 1 ± t = tg - Z\yg -P 1 ± 2to = 1 - 1 • 0 -P 1 ± 2to = 2(1 ± to) mod 8. If 
to = 1 mod 4, that is (t mod 8 , y mod 8) = (2, 0), we get p+l + t = 4 mod 8 and 
p -P 1 — t = 0 mod 8. Hence, only (p -P 1 -P t)/4 can be prime. If to = 3 mod 4, 
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that is {t mod 8, y mod 8) = (6, 0), we get p + 1 + t = Q mod 8 and p + 1 — t = 
4 mod 8. Hence, only (p+1 — 1)/4 can be prime. The case j/o = 2 mod 4 is treated 
analogously. □ 

It follows from Lemma[T]that the order of a cryptographically strong elliptic 
curve E over Fp is 4/ with a prime I > In the algorithm f indPrime we first 
initialize y to a random integer in [1, [a/4v/|^J] which is 0 mod 8. We restrict to 
y = 0 mod 8 to avoid distinguishing several cases. For all possible t = 2 mod 4 we 
try to find a cryptographically strong curve. By Lemma |T]the number (p+l+t)/4 
can only be prime if t = 2 mod 8. If this is true, we test whether p + 1 + t is the 
order of a cryptographically strong curve by using the function isStrong. If the 
order happens to be cryptographically strong, this function returns the prime I 
such that p + 1 + t = Al. Otherwise, the function returns 0. If t = 6 mod 8, then 
we test whether p + 1 — t is the order of a cryptographically strong curve. If no 
cryptographically strong curve is found, then the algorithm chooses a new y and 
repeats the above procedure. Since y is chosen randomly, the algorithm returns 
a different curve each time it is called. 



f indPrime(Zi, v, w) 

Input: A G ^< 0 , A = 1 mod 8 , Zi ^ 0 mod 3, v,w G 7Z, w > v > 2^®^. 
Output: Primes p G [v, w] and I > 

such that |i?(IFp)| = 4Z for a cryptographically strong curve E over IFp 
with endomorphism ring Oa- 

while true do 

Choose a random y in [1, 4n/|Z\|J] with y = 0 mod 8; 

t ^ A\{^4v-\A\y2 - 2)/4l + 2; 
while p <— + \A\y'^)/A < w do 

if p is prime then 

if t = 2 mod 8 and I <— isStrong(p,p + 1 + t) 7 ^ 0 then 
return (p,l); 

else if I <— isStrong(p,p + 1 — t) 7 ^ 0 then 
return (p, Z); 
end if 
end if 
t < — Z + 4; 
end while 
end while 



If the interval used in f indPrime is too small then f indPrime may 

not terminate. According to our experiments, intervals of length 2^^^ seem to be 
sufficiently large. 

The function isStrong decides whether a curve, whose order is determined 
in f indPrime, is cryptographically strong. The order of p in the multiplicative 
group is denoted by ordjpx(p). The algorithm implements the conditions 
from Section [b1 
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isStrong(p, N) 

Input: A prime p and N £ IN, N = 0 mod 4. 

Output: A prime I with = 4/ if is the order of a cryptographically strong elliptic 
curve over IFp, 

Z = 0 otherwise. 

if Z <— A/4 is prime and Z > and p / Z and ordjpx (p) > |" j then 
return Z; 
else 

return 0; 

end if 



The function findCurve implements the construction of the curves as ex- 
plained in © and 



f indCur ve (A,p,l) 

Input: A £ ^< 0 , A = 1 mod 8, A ^ 0 mod 3. A prime p such that there exists a 
curve of order 4Z over IFp. 

Output: An elliptic curve A over IFp with |A(IFp)| = 4Z and endomorphism ring Oa- 
A point P in A(lFp) of order Z. 

jp V- f indRoot(A,p); 

Select a quadratic nonresidue Sp mod p; 

El < (op, bp)', 

E2 * (upSp, ZjpSp); 

while true do 

Choose Qi randomly in (Ai(Fp)) \ {O}; 

Choose Q 2 randomly in (A 2 (Fp)) \ {O}; 
if 4Qi yf O and 4ZQi = O then 
return (Ei, 4Qi); 

else if 4 Q 2 yf O and 4 ZQ 2 = O then 
return (A2,4Q2); 

end if 
end while 



The function f indRoot(Z\,p) determines a zero mod p of the polynomial iJ 
from above. If the class number of Oa is large then the coefficients of If are 
extremely large. Instead, following [YZ97J . we use a polynomial W of the same 
degree which is computed by f indPolynomial (Z\) . This polynomial has much 
smaller coefficients. For example, if Z\ = —71 then 

ir = X^ + 313645809715A® - 3091990138604570A® -F 

98394038810047812049302A‘‘ - 823534263439730779968091389X^ -F 
5138800366453976780323726329446X2- 
425319473946139603274605151187659X-F 
737707086760731113357714241006081263 
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and 

W = -X^-X^ + X^-X^-X^ + 2X + 1 . 

Also, if Wp is a zero of W mod p then 

Jp — (ujp ~ 16 ) /Wp 



is a zero of H mod p. 



f indRoot(A,p) 

Input: A € ^< 0 , A = 1 mod 8, A ^ 0 mod 3. 

A prime p which is the norm of an element of Oa ■ 
Output: A root jp of H mod p. 

W ^ f indPolynomial(A); 

Wp <— f ind_root(p, W^); 
return jp ^ (lOp'* — 16)^/uip‘^; 



Next, we explain f indPolynomial(A). In this algorithm, we first compute 
the set Ra of reduced integral primitive forms (a, b, c) = ax^ + bxy + cy^ of 
discriminant A. An algorithm for computing this set can be found in |Coh95| . 
p.228. We remark that \Ra\ is the class number of the quadratic order of dis- 
criminant A. Using Ra we define the polynomial W. For r in the upper half 
plane of C we set Qt = . The Weber functions (see |Webn2j ) are 

OO 

f(r) = g;® + «""") > (7) 

n—1 

OO 

fi(T) = g;* > (8) 

n—1 

OO 

h{T) = V2qr^l[{l + q!^) . (9) 

n=l 



For g = (a, b, c) S Ra set Tg = — — Then Tg lies in the upper half plane of 
C. We set 



(b{a-c-aA) . 

fig) ■■= { (-1)^ -fl(Tg) 

(-1)^ •f2(rg) 



if 

if 

if 



2 I a, 
2 I a, 
2fa, 



2 I c 
2fc 
2 I c 



( 10 ) 



with ( = Then the polynomial W is 



H iX-f{g)) . 

g€. Ra 



( 11 ) 
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This polynomial has integer coefficients. In f indPolynomial(Z\), we compute 
W using m- The zeros f{g) of W, g G Ra are defined as infinite products. 
We approximate them by truncated products using the LiDIA type bigcomplex 
with precision (as in | |LZ94| 1 



More precisely, we keep multiplying the factors of f{g) in d7J, ([S]) , or ® until 
five successive partial products are equal within the chosen precision. The fol- 
lowing observation can be used to speed up the algorithm. If (a, b, c) G Ra then 
f{a,—b,c) = f{a,b,c). Hence, if {a,—b,c) also belongs to Ra, then we obtain 
f{a,—b,c) almost for free. By multiplying all linear factors X — f{g) using our 
truncated products f{g), we obtain a polynomial whose coefficients are close to 
integers. We determine W by rounding each coefficient to the closest integer. 

To compute a zero Wp of W mod p we use the LiDIA- function 
f indjroot(p, W). As input this function requires a prime p and a polynomial 
W G ^[X] which splits into linear factors modulo p. It returns a zero of 
W mod p. f indjroot uses the Cantor-Zassenhaus split (see ICohOSH and a poly- 
nomial arithmetic due to Shoup ISho95l . 



D Examples and Running Times 



We implemented our algorithm in C-|— I- using the library LiDIA 2.0 ( [LiDIAj ) and 
the GNU compiler 2.95.2 . The timings were measured on a SUN UltraSPARC-IIi 
running Solaris 2.6 at 333 MHz and having 512 MB of main memory. 

We present three examples. The class numbers in these examples are at least 
200 to meet condition |5]of Sect. |B] Timings - even for class numbers less than 
200 - can be found in Tables m El and El 

Let us first consider a fundamental discriminant of class number 200. We 
take A = —21311 as A is maximal with this property. We use the floating point 
precision F = 48. Running times can be found in Table El We get the following 
output: 



F = 



G+ 1 + 5 



+ 1 



47 



where 
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p = 11243881047220088360258169203680015118559820821349 
[logaPj + 1 = 163 

I = 2810970261805022090064542583157653944571338913691 
[log2 /J + 1 = 161 

E = (9664333206544914885970277381389792397205945721715, 
9712273578800196724738364136255277465294004742516) 
\E{Wp)\=4-l 

P = (9734858823530657227741101030084051747415204761009, 
9869588797805709934476211640090148068685) 

In the second example we use the fundamental discriminant A = —96599 
with h = 500. Again, A is maximal with this property. We use F = 129. We 
obtain the following output: 

p = 11239364022024445602395071076291409161418004790449 
[logaPj + 1 = 163 

I = 2809841005506111400598767640310076774940711944009 
[log2 ZJ + 1 = 161 

E = (3056338225511409442512995201958098190832410825281, 
9530468165023903363272044185499671568166943743820) 
\E{Wj,)\=4-l 

P = (7706333343307924406946634579911418698936319570270, 
40461280021002528566953548016721796786860) 

To show the limit of the algorithm we also present an example with the 
fundamental discriminant A = —10000031 and h = 5426. We use the floating 
point precision F = 1986. We get the following output: 

p = 11575224349584783304586589602930035397585956397089 
[log2pJ + 1 = 163 

I = 2893806087396195826146647734591894354891848869089 
[log2 + 1 = 161 

E = (97581518612045702399500389635401693009593305388, 
65054345741363801599666926423601128673062203592) 
\E{Wp)\=4-l 

P = (4475160835035949933240439608822439678768106791477, 
6992349469932163330795423460823116723197169323796) 



The total running time was 1 day 13 hour 41 min 39 sec. The time to compute 
W and Wp was 1 day 13 hour 21 min 56 sec and 19 min 13 sec, respectively. 
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Table 1. Timings in seconds for fundamental discriminants of class numbers 20 up 
to 150. Each discriminant is maximal for a given class number. Each hrst row contains 
the running time, each second row the standard deviation. 





■ 


Number 
of tests 


Time of 
cryptoCurve 


Time of 
f indPrime 


Time of 
f indPolynomial 


Time of 
f ind_root 


-455 




100 


6.235 

5.172 


5.275 

5.166 


0.05463 

0.03266 


0.6242 

0.05251 


-1271 




100 


8.278 

5.345 


5.871 

5.347 


0.1245 

0.04943 


2.004 

0.1157 


-2159 




100 


10.03 

4.901 


5.079 

4.869 


0.2492 

0.08125 


4.421 

0.2091 


-5183 




100 


15.08 

7.269 


6.238 

7.213 


0.3778 

0.1382 


8.174 

0.3276 


-7991 




100 


17.02 

5.806 




0.6289 

0.2033 


10.28 

0.4606 


-11879 




100 




6.418 

5.654 


1.272 

1932 


19.46 

0.7965 



Table 2. Timings in seconds for some fundamental discriminants of class number 200. 
Each first row contains the running time, each second row the standard deviation. 





■ 


Number 
of tests 


Time of 
cryptoCurve 


Time of 
f indPrime 


Time of 
f indPolynomial 


Time of 
f ind_root 


-21311 




100 




5.808 

5.458 


2.420 

0.009839 


24.52 

0.5377 


-31031 




100 






2.739 

0.003942 


24.26 

0.5232 


-40895 




100 






2.720 

0.004976 


24.20 

0.5489 


-50183 




100 






3.042 

0.008005 


24.33 

0.5736 


-100015 




100 




BiW 


3.033 

0.007614 


24.29 

0.5901 


-500047 




100 






4.368 

0.01136 


24.43 

0.5162 



Table 3. Timings in seconds for fundamental discriminants of class numbers 500 up 
to 1000. Each discriminant is maximal for a given class number. Each hrst row contains 
the running time, each second row the standard deviation. 





1 ^ 


Number 
of tests 


Time of 
cryptoCurve 


Time of 
f indPrime 


Time of 
f indPolynomial 


Time of 
f ind_root 


-96599 




25 


101.2 

7.107 


8.146 

6.782 


29.30 

0.1116 


63.36 

1.201 


-148511 




25 


152.9 

5.948 


5.846 

4.450 


49.03 

0.2956 


97.56 

2.642 


-185471 


jj^ 


25 


197.8 

9.068 


7.426 

8.576 


82.29 

0.5049 


107.7 

1.572 


-233999 




25 


243.2 

4.613 




120.6 

0.4135 


117.2 

1.034 


-299519 




25 


320.5 

5.258 




188.8 

1.141 


126.5 

1.684 


-412079 


1000 


25 


430.0 

6.638 


6.282 

6.708 


283.1 

0.3415 


140.0 

1.580 
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Abstract. In this paper we describe an efficient algorithm for multipli- 
cation in F 2 m , where the field elements of ¥ 2 ^ are represented in standard 
polynomial basis. The proposed algorithm can be used in practical soft- 
ware implementations of elliptic curve cryptography. Our timing results, 
on several platforms, show that the new method is significantly faster 
than the “shift-and-add” method. 

Key words. Multiplication in ¥ 2 ™, Polynomial Basis, Elliptic Curve 
Cryptography. 



A Introduction 

Efficient algorithms for multiplication in F2m are required to implement cryp- 
tosystems such as the Difhe-Hellman and elliptic curve cryptosystems defined 
over F2m . Efficient implementation of the field arithmetic in F2m depends greatly 
on the particular basis used for the finite field. Two common choices of bases for 
F2m are normal and polynomial. Normal bases seem more suitable for hardware 
implementations (see m- 

In this paper we describe a technique for multiplication in the finite field 
F2m, where the field elements are represented as binary polynomials modulo 
an irreducible binary polynomial of degree m. The proposed method is about 
2-5 times faster than the standard multiplication, and is particularly useful for 
software implementation of elliptic curve cryptosystems over F2m. It is based on 
the observation that Lim/Lee’s method [ 0 ] (or comb method |H]), designed for 
exponentiation, can be modified to work in F2m. 

The remainder of this paper is organized as follows. In Section we describe 
the finite field F2"* using a polynomial basis, along with a description of the 
standard algorithm for multiplication in F2m. A description of a simple version 
of Lee/Lim’s method and two versions of the proposed method are described in 
Section E] In Section El we present timing results on different computational 
platforms. 
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B The Finite Field F2^ 

B.l Polynomial Basis Representation 

In this section we describe the finite field F 2 m, called a characteristic two finite 
field or a binary finite field, in terms of a polynomial basis representation. Let 
/(x) = X™ + (where fi G {0, 1}, for i = 0, . . . , m — 1) be an irre- 

ducible polynomial of degree m over F 2 ; polynomial /(x) is called the reduction 
polynomial. A polynomial basis is specified by a reduction polynomial. In such a 
representation, the bit string (om-i • ■ • ai«o) is taken to represent the polynomial 

^ fliX^ -|- Oq 

over F 2 . Thus, the finite field F 2 m can be represented by the set of all polynomials 
of degree less than m over F 2 . That is, 

F 2 ™ = {(om-i ■ ■ ■ aiao) I tti € {0, 1}}. 

The field arithmetic is implemented as polynomial arithmetic modulo /(x). In 
this representation, addition and multiplication of a = (om-i • ■ -aiao) and b = 
{bm-i ■ ■ ■ bibo) are performed as follows: 

— Addition: a + b = (cm-i ■ • ■ ciCq), where Ci = (at -I- bi) mod 2. 

— Multiplication: c = a • b = (cm-i • • ■ ciCq), where the polynomial c(x) = 

is the remainder of the division of polynomial ' 

6iX*) by /(x). That is, c= ab mod /. 

For efficiency reasons, the reduction polynomial can be selected as a trinomial 
X™ -I- x^' -I- 1, where lora pentanomial x™ -I- x^^ -I- x^^ -I- x^^ -I- 1, 

where 1 < fci < ^2 < fcs < m — 1- ANSI X9.62 |2] specifies several rules for 
choosing the reduction polynomial. 

In software implementations, we partition the bit representation of a field 
element a = (om-i • ■ -Oiao) into blocks of the same size. Let w be the word 
size of a computer (typical values are w = 8, 16, 32, 64), and s be the number of 
words required to pack a into words. That is, s = fm/w]. Then, we can write 
a as an src-bit number consisting of s words, where each word is of length w. 
Thus, we can write 

a = (As_i . . . AiAo), 

where each Ai is of length w and defined a^ 

Ai = 1 • ■ • ^iw+lCliw) G F2'u' . 



In polynomials terms. 



s— 1 s— 1 w— 1 

a(x) = ^ A,(x)x*“' = 5] 5] 

2 — 0 2—0 i — 0 



^ The leftmost sw — m bits of A 3-1 are set to 0. 
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B.2 Recent Methods for Multiplication in F 2 ™, 

In recent years, several algorithms for software multiplication in F 2 m have been 
reported; however, we are interested in techniques that can be used when m is 
prime H In Schroeppel et al. [m various programming tricks are discussed for 
implementing the “shift-and-add” method, a basic algorithm for multiplication 
in F2m. A slight variant of this method is described by De Win et al. jl 2 ]. In 
Kog [S], a word-level Montgomery multiplication algorithm in F2m is proposed. 
This method is significantly faster than the standard method whenever the mul- 
tiplication of two words of size w, each one representing a polynomial in F2™ 
can be performed in few cycles. Since this operation is not available in most 
general purpose processors, the alternative is to use table lookup. This approach 
requires, for example, 128 Kbytes for w = 8 and 16 Gbytes for w = 16, mak- 
ing it less attractive for practical applications. Another well known method for 
multiplication in F2m is that of Karatsuba (see for example ID)- 

B.3 The “Shift-and-Add” Method 

In this section we describe the basic method for computing c{x) = a(x)-b(x) mod 
f(x) in F 2 m. It is analogous to the binary method for exponentiation, with the 
square and multiplication operations being replaced by the SHIFT (multiplication 
of a field element by cc) and field addition operations, respectively. Thus, the 
“shift-and-add” method processes the bits of polynomial a(x) from left to right, 
and uses the following equation to perform c = ab mod /: 

c{x) = x(- ■ ■ x(xam-ib(x) + am-2b(x) mod f(x)) -!-•••)+ aob{x) mod f{x). 

Assume that a{x) = X)i=o b{x) = ^iX^\ and f{x) = X)i=o 

Then the steps of the “shift-and-add” method are given below. 

Algorithm 1. The “shift-and-add” method. 

Input: a= (A^_i...Ao), b= (B,_i...Bo), and /= (F^_i...Ao). 
Output: c = {Cs-i . . . Co) = a - b mod /. 

1. Set fc <— TO — 1 — w(s — 1), c <— 0 

2 . for i from s — 1 downto 0 do 

for j from k downto 0 do 
Set c ^ SHIFT(c) 
if Qiiij+j = 1 then c ^ c 0 & 
if Cm = 1 then c ^ c © / 

Set fc <— ti; — 1 

3. return (c) . 

This algorithm requires to — 1 shift operations and m field additions on aver- 
age, but the number of field additions can be reduced by selecting the reduction 

^ Many standards that include elliptic curves defined over ¥2^ recommend for security 
reasons, the use of binary finite fields with the property that m be prime. 
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polynomial /(x) as a trinomial or a pentanomial. Observe that in this algorithm, 
the multiplication step (the computation of d{x) = a(x) • b(x)) and the reduc- 
tion step (the computation of c(x) = d(x) mod f(x)) are integrated. Since for 
the proposed algorithm these steps are separated, we include Algorithm 2 for 
performing the reduction step. Assume that f(x) = x"^ + g(x), where the degree 
of polynomial g(x) is less than m — w. 



Algorithm 2. Modular reduction. 

Input: a = (A„_i . . . A«_i . . . Ao),and / = {F^-i . . .Fq). 
Output: c = {Cs-i . . . Co) = a mod / 

1 . for i from n — 1 downto s do 

Set d iw — m 

Set t ^ Ai{x)x'^ ■ f{x) = aiu,+jX^+^ ■ f{x) 

// t = (Ti . . .Ti-sO ■ . .0) , where Ti = Ai // 
for j from i downto t — s do 
Set Aj ^ Aj 0 Tj 

2. Set t ^ • /(x) 

// t={Ts-i...To) // 

3 . for j from s — 1 downto 0 do 

Set Aj ^ Aj 0 Tj 

4. return (c ^ (A^-i . . . Aq)). 



Algorithm 2 works by zeroing out the most significant word of a(x) in each iter- 
ation of step 1. A chosen multiple of the reduction polynomial /(x) is added to 
a(x) which lowers the degree of a(x) by w. This is possible because the degree of 
g(x) is less than m — w. Finally, the leading sw — m bits of Ag-i are canceled in 
step 3 obtaining a polynomial of degree less than m. The number of XDR opera- 
tions will depend on the weight of the reduction polynomial /(x). For example, 
if /(x) is a pentanomial then Algorithm 2 requires at most 8n XOR operations. 

Remark 1. The use of standard programming tricks such as separated name 
variables, and loop-unrolled code, can be used to improve the performance of both 
Algorithms 1 and 2. See m for some suggested programming optimizations. 



C Proposed Method 

In this section we describe two versions of the new algorithm for multiplication 
in F 2 m. The first version is a straightforward extension of Lim/Lee’s method, 
which does not require extra temporary memory. The second version is based 
on a window technique. Before we describe the proposed algorithms, we discuss 
a simple version of Lim/Lee’s method for exponentiation, using the terminology 
of additive groups; this will help us to understand the extension to F 2 m. 
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In order to compute the “multiplication” a ■ g (the addition of g to itself a 
times) where a is an integer and g is an element of an additive group, the number 
a is divided into s words of size w. Then a can be written as 

s-l 

a=(A,_l...AlAo) = ^A, 2 “'^ 
i=0 

where each 0 < i < s, has the binary representation (atw+w-i ■ ■ ■ aiw+io-iw) 2 - 
Based on the binary representation (rtg-i . . .uiUo )2 of u, 1 < m < 2^, and the 
group elements 2“'* ■ g,0 < i < s — 1, define the vector P[u] of precomputations 
by the following equation: 

P[u] = • g + • 5 + • • • + ui2^ -g + uo-g. 

Then the multiplication a ■ g = X)i=o • g, can be computed as 

w— 1 s— 1 w— 1 

« ■ = E • ff) = E (1) 

j—0 i—0 j—0 

where Ij = {a(^s-i)w+j ■ ■ ■ o,w+jaj) 2 - A detailed algorithm for computing a ■ g 
using the Lim/Lee’s precomputation technique is given in Algorithm 3. 



Algorithm 3. Lim/Lee’s algorithm. 

Input: a = ^^2“*, = (a™+„,_i . ..ai^)2,0 < i < s, and g. 

Output: r = a - g 

/ / Precomputation // 

1 . for u from 0 downto 2'* — 1 do 

Set u <— (us-i • • ■ uiUo )2 
Set P[u] ^ EE • g 
1 1 Main Computation // 

2. Set r ^ 0 

3 . for 2 from rc — I downto 0 do 

Set r ^ r + r 

Set U < (^CL(^g— i‘^ 2 u+j • • ■ ^w-\-j^j ^2 

Set r <— r -I- P[u] 

4. return (r) . 



Algorithm 3 performs well in situations where the group element g is known in 
advance, since the calculation of the precomputation step can be made off-line. 
A faster version of this algorithm, with more precomputations, is discussed in 

m- 

Next we explain the extension of Algorithm 3 to the finite field F 2 ™. Let a 
and b be two polynomials in F 2 ^. Assume that a can be represented as a = 
(As_i . . . Ao). By replacing 2 by a: and 2“’ • 5 by x'^b{x) in ([TD, we obtain the 
following formal expression for the product a(x)b(x): 
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W— 1 S— 1 

a{x)b{x) = ^ x^C^aiw+jx'^"b{x)). 

j=0 i=0 

It is easy to verify that indeed the above formula for a(x)b(x) is correct. Then an 
algorithm, analogue of Algorithm 3, can be derived for computing ab mod / when 
6 is a polynomial known in advance. By observing that the operation x'^^b(x) 
is virtually free (it consists of an arrangement of the words representing b), the 
precomputation of the 2^ — 1 polynomials: P[u] = UiX^^,l < u < 2®, rt = 

(us-i . . .^0)2, can be made online. This eliminates the need of storing 2® — 1 
polynomials, and the resulting algorithm is faster than Algorithm 1, even when 
b is not a fixed polynomial. The details of this method are given in Algorithm 4. 



Algorithm 4. Left-to-right comb method 

Input: a= (A^_i...Ao), b= (B,_i...Bo), and /= (F^_i...Ao). 
Output: c = {Cs-i . . . Co) = ab mod / 

1 . Set Ti ^ 0; i = 0, . . . , 2s - I 

2 . for j from ic — I downto 0 do 

for i from 0 to s — I do 

if the jth bit of Ai is 1 then 
for k from 0 to s — 1 do 
Set Tk+i ^ Tk+i © Bk 
if j yf 0 then T ^ xT 1 1 shift T// 

3 . Set c^T mod / // Use Algorithm 2 // 

4. return (c) . 



Note that Algorithm 4 requires w — 1 SHIFT operations of a polynomial of 
2s- words. By observing that a{x)b{x) can be written as 

W— 1 S— 1 

y^^(y^^aiw+jx'^''x^b{x)), 

j—0 2—0 

a right-to-left version of Algorithm 4 can be derived. The resulting algorithm is 
an improvement over Algorithm 4 since the word lenght of the shift polynomial 
(x^b{x)) is less than 2s. 

The idea of window methods [H pp. 66] for exponentiation can be extended to 
Algorithm 4 to obtain a more efficient algorithm, provided that extra temporary 
memory is available. For example, if we define the precomputed vector Pie[u] 
for 0 < u < 16, using the equation 

Pie[u]{x) = {usx^ + U 2 X^ + u\X + uo)b{x), 
where u = (1x3 . . . 1x0)2, then the product a(x)b(x) can be computed as 
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S— 1 W— 1 

a{x)b{x) = X! X! (^iw+jx'‘'"~''^b{x) 

i—0 j—0 
w— 1 s— 1 

= ^ ^ aiw+jx''^Kx) 

j—{) 

io/4— 1 s—1 

— ^ ^ ^ ^ ^ ^ (0'zif;+j+3^ “h * * * “h Qi'u}-\-j-\-iX ~h Qi'u}-\~j')x 6(x) 

3^0 

w/4—1 s—1 

— ^ ^ ^ ^ ^ ^ ^ -^16 [^2 j ] (^)) ) where '^i,j — (^iio+j+S • ■ • ^iw-\-j^2- 

j—0 i—0 

Based on the above formula for a6, we derived an algorithm that processes 
simultaneously four bits of each word of a and trades in each iteration four 
multiplications by x for one multiplication by x^. This method is described in 
Algorithm 5. 

Algorithm 5. Left-to-right comb method with windows of width w = 4 

Input: a = . . . Aq), b = . ..Bq), and / = {Fs-i . ..Fq). 

Output: c = (Cg-i . . . Co) = ab mod/. 

1 . for j from 0 to 15 do 

Set Pi6[j] ^ {jsX^ H \-jo)b{x),j = (/3j2jlJo)2 

2. Set T, ^ 0; i = 0, ... ,2s - 1 

3. for j from w/4—1 downto 0 do 

for i from 0 to s — 1 do 

Set Uij ^ Aij2^^ mod 16 
for k from 0 to s — 1 do 

Set Tk+i ^ Tk+i © Pie[ui,j][k] 
if J 7 ^ 0 then T ^ x^T 

4. Set c^T mod / // Use Algorithm 2 // 

5 . return (c) . 

Remark 2. When b is known in advance, Algorithm 5 can be modified to work 
with a larger window size. If we process eight bits at the same time, then we 
need 256 field elements of precomputations. By observing that X)I=o o,ix'‘b{x) = 
Jj^^QCijX^b{x) + ci 4 +jX^ x^b{x) , we reduce the precomputation to 32 field 

elements at the expense of doing more XDR operations. 

C.l Performance Comparison 

Let us compare the performance of Algorithms 4 and 5. We calculate the number 
of XOR operations and SHIFT operations required in each algorithm. We assume 
that the reduction polynomial is a pentanomial, so the total number of XOR 
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operations required by Algorithm 2 is at most 8(2s — 1). Therefore, Algorithm 4 
requires re — 1 SHIFT operations of a 2s-word polynomial and sm/2 + 8(2s — 1) 
XDR operations on average. Similarly, Algorithm 5 requires 3 SHIFT operations 
of a s-word polynomial for the precomputation step, w/A — 1 SHIF10 operations 
of a 2s-word polynomial and s(ll + m/4) + 8(2s — 1) XDR operations on average. 
Thus, the time saved in Algorithm 5 is at the expense of using 14 field elements of 
temporary memory. In Table |T] we compared the number of operations required 
by Algorithms 1, 4 and 5, for the particular case m = 163, w = 32, s = 6, and 
the pentanomial f{x) = + x® + + 1. 

Table 1. Number of operations for Algorithms 1, 4 and 5. 





XOR 


s-word SHIFT 


Algorithm 1 


81*6-F 81*2 = 648 


162 


Algorithm 4 


81*6 -F 42 = 528 


62 


Algorithm 5 


52*6 -F 42 = 354 


17 



D Timings 

This section presents timing results for the proposed algorithms and the “shift- 
and-add” method on the following platforms: a 233 MHz Pentium MMX, a 400 
MHz Pentium II, a 450 MHz Sun UltraSparc workstation and a 10 MHz Intel 
386 processor (RIM interactive pager [3j). The implementation was written en- 
tirely in C, and the compilers used were gcc for the workstation Sun and the 
Pentium MMX, and Microsoft Visual C-F- 1- (version 6.0) for the other architec- 
tures. All algorithms were implemented with a comparable level of programming 
optimizations. 

Table [2] shows running times to perform a multiplication in F 2163 0 The results 
in this table illustrates that Algorithm 4 is about 45% to 49% faster than the 
standard method. We also found that Algorithm 5 required 3.0 to 5.5 times as 
long as the standard method, however this significant speedup is at the expense 
of using an extra storage of 336 bytes. 

Table 2. Timings (in microseconds) for multiplication in F 2163 . 





RIM 
10 MHz 


Pentium 
233 MHz 


Pentium II 
400 MHz 


UltraSparc 
450 MHz 


Algorithm 1 


4,848 


31.27 


16.48 


10.97 


Algorithm 4 


2,500* 


17.02 


8.40 


5.55 


Algorithm 5 


1,515 


10.20 


3.00 


2.52 



(*estimated) 



^ We are assuming that multiplying a polynomial by x*‘ is comparable in speed to 
multiplying a polynomial by x. 

Recently, NIST has recommended elliptic curves over F 2163 for US federal government 
use [T0| . 
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D.l Applications 

The most important application of this work is in software implementations of 
elliptic curve cryptography over F 2 m. Based on the fast version of the proposed 
method (Algorithm 5) we implemented an algorithm for the computation of an 
elliptic scalar multiplication, which is the central operation used by elliptic curve 
cryptosystems. In Table |3] we show the timing results of our implementation of 
Montgomery method for computing fcP, where k is an integer and P is an 
arbitrary point on a random curve defined over F 2163 . 

Table 3. Timings (in milliseconds) for computing kP, P an arbitrary point. 



Operation 


RIM 
10 MHz 


Pentium 
233 MHz 


Pentium II 
400 MHz 


UltraSparc 
450 MHz 


kP 


1,650* 


10.96 


3.24 


2.74 



(*estimated) 



E Conclusions 

In this paper we have shown a technique for speeding up the computation of 
c = ab mod / in F 2 m. The software implementation of the proposed method, 
on different platforms, proved to be significantly faster than the “shift-and-add” 
method, making it useful for software implementations of elliptic curve cryptog- 
raphy in modern workstations as well in wireless devices such as the RIM pager 
(a hand-held device with an Intel processor running at 10 MHz 0 ). 
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Abstract. In cryptographic applications, the use of normal bases to 
represent elements of the finite field GF(2’") is quite advantageous, es- 
pecially for hardware implementation. In this article, we consider an 
important field operation, namely, multiplication which is used in many 
cryptographic functions. We present a class of algorithms for normal 
basis multiplication in GF(2"*). Our proposed multiplication algorithm 
for composite finite fields requires significantly lower number of bit level 
operations and hence can reduce the space complexity of cryptographic 
systems when implemented in hardware. 



A Introduction 

Many cryptographic functions, such as, key exchange, signing and verification, 
require significant amount of computations in the finite field GF(2"*). The el- 
ements of such a field can be represented in different ways. The choice of the 
representation plays an important role in determining the complexity of a finite 
field arithmetic unit and, consequently, that of a cryptographic system. Among 
the various ways one can represent field elements, the use of normal bases has 
drawn significant attention, especially for implementing cryptographic functions 
in hardware. 

In a normal basis representation, squaring can be performed simply by a cycle 
shift of the coordinates of an element, and hence in hardware it is almost free 
of cost. Such a cost advantage often makes the normal basis a preferred choice 
of representation. However, a normal basis multiplication is not so simple. In 
13, Massey and Omura proposed a normal basis multiplication scheme which 
can be implemented in bit-parallel fashion using m identical logic blocks whose 
inputs are cyclically shifted from one another |1U . Although, this normal basis 
multiplier offers modularity, its space complexitjO is considerably high. 

In the recent past, considerable efforts have been made, for examples, m.m. 
B, 0, and [8], to reduce the space complexity of the normal basis multiplier. In 
13, two special types of normal bases were reported which are known as type-I 
and type-H optimal normal bases. The use of these optimal normal bases can 
considerably reduce the complexity of the multiplier m. m. 0- 

^ Conventionally, the space complexity of the GF(2’") multiplier is given in terms of 
the number of logic gates, namely XOR and AND gates, which corresponds to GF(2) 
(i.e., bit level) addition and multiplication, respectively. 
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In this article, we first present an algorithm for multiplication in GF(2’”). 
This algorithm is quite generic in the sense that it is not restricted to any spe- 
cial type of normal bases. Compared to other generic algorithms for normal basis 
multiplication in GF(2'"), the proposed one requires fewer bit level multiplica- 
tions. Although this is achieved at the expense of extra bit level additions, the 
total number of GF(2) operations is the same as that of the best know generic 
algorithm. Our algorithm is then applied to the type-I optimal normal basis to 
further reduce the number of bit level operations. We then present an algorithm 
for normal basis multiplication in composite finite fields. This algorithm signifi- 
cantly reduces bit level operations, in terms of both addition and multiplication 
over GF(2). To show the advantage of the proposed algorithms, we compare our 
results with those of the best known normal basis multipliers. 



B Preliminaries 

B.l Normal Basis Representation 

It is well known that there exists a normal basis (NB) in the field GF(2’”) over 
GF{2) for all positive integers m. By finding an element /3sGF(2’”) such that 
{/3, /3^, • • • , is a basis of GF(2™) over GF(2), any element A S GA(2’”) 

can be represented as 

m— 1 

A = aif3^ = ao(3 + a\0^ -I- • • • -I- am-i0^ , (1) 

i=0 

where Oi € GF(2), 0 < i < m — 1, is the i-th coordinate of A. In short, this 
normal basis representation of A will be written as A = (oq, oi, • • • , Um-i)- In 
vector notation, equation O will be written as 

A = a-^ ( 2 ) 

where a = [oq, ai, • • • ,am-i], /3 = [/3, /3^, • • • , /3^ ], and T denotes vector 

transposition. 

The main advantage of the NB representation is that an element A can be 
easily squared by a cyclic shift of its coordinates. 

B.2 Normal Basis Multiplication 

Let A and B be any two elements of GF(2™) and be represented w.r.t. the 
NB as A = and B = respectively. Let G denote their 

product, i.e., 

C = A ■ B = {a - ■ {P ■ f) = a-M ■ , 



(3) 
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where the multiplication matrix M is defined by 



M = • /3 = 



y+2^ 



m— 1 

i,j=0 



( 4 ) 



All entries of M belong to GF(2™) and if they are written w.r.t. the NB, then 
the following is obtained 



M = Mo/3+Mi/32 + ... + M^_i/32’"-\ (5) 

where M^’s are mxm matrices whose entries belong to GF{2). Substituting (|S| 
into m, the coordinates of C are found as follows 

a = 0 < f < m - 1 (6) 

where is the i-fold left cyclic shift of a and the same is for 6^®^ [T] . 



Definition 1. The numbers of 1 ’s in all Mi ’s are equal. Let us define this num- 
ber by 

Cn = H(M,), 0<i<m-l, (7) 

which is known as the complexity of the NB In 7J(Mi) refers to the 
Hamming weight, i.e., the number of I’s, in M^. 



C A New Multiplication Algorithm 



In (E) , the multiplication matrix M is symmetric and its diagonal entries are the 
elements of the NB. Thus, we can write 

M = U + U^ + D, (8) 



where D = diag(/3^, • • • , \ /3) is a diagonal matrix and U is an upper 

triangular matrix with zeros at diagonal entries as given below 



U = 



0 

0 0 



0 0 
0 0 



^ 2 + 2 —^ ^ 2 + 2—1 

0 /32'"-+2— 1 

0 0 



( 9 ) 



In (El , the exponents of /3 can be represented in the binary form using m bits 
where each exponent will have exactly two I’s. The cyclic distance between the 
two I’s is in the range [1, i;], where v = ■ For example, with m = 4 each of 

the following three exponents, viz., 3 = OOII 2 , 6 = OIIO 2 , and 9 = IOOI 2 , has a 
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cycle distance of one; additionally 0^ as well as /3® can be obtained from /3® by 
squaring operations which are free of cost in the normal basis representation. In 
this regard, let us define 

5, A /3i+2‘ i = l, 2, (10) 

Lemma 1. Let A and B he two elements of GF(2'^) and C he their product. 
Then 



C = 



E m — 1 
1=0 

E m — 1 
1=0 






for m odd 
for m even. 



where 



( 11 ) 



yj,i - («1 + a{{i+j))){bj + &((i+j))), 1 < i < u, 0 < j < m - 1. (12) 

Let hi, 1 < 1 < u, be the number of I’s in the normal basis representation 
of 6i, i.e., hi = H{5i). Let Wip, Wi^ 2 , ■ ■ ■ , Wi,hi denote the positions of I’s in the 
normal basis representation of <5^, i.e.. 



hi 

Si^^/3^ 1 < ^ < (13) 

k^l 

where 0 < wtp < Wi ^2 < ■ ■ ■ < Wi^hi < m—1. Also, for even values of m, one can 
obtain 

h-D 

= Y (14) 

k^l 

Now, substituting (ESJ and (El into (O, we have the following theorem. 



Theorem 1. Let A and B he two elements of GF{2'^) and C he their product. 
Then 



C = 



E7=“o' + E:=i E:Li (E7Eo' ) . form odd 

E7Eo' 00,0^ + EE/ EEi (E7=E + D, for m even 



(15) 



where 

^ v-l 

^ ^ ^ and v="^. 

k—1 j—0 

Note that for a normal basis, the representation of Si is fixed and so is 
Wi^k, 1 < i < V, 1 < k < hi. Based on now we have the following al- 

gorithm for low complexity normal basis (LCNB) multiplication. 



On Efficient Normal Basis Multiplication 



217 



(Low Complexity Normal Basis Multiplication over GF{2'^)) 
Input: A, B G GE(2"‘), Wi^k, l<*<n, 1 < k < hi 

Output: C = AB 



1. 


Generate pj,i = [aj -I- 0 ((i+j)))( 6 j - 1 - &((i+^))), l<i<u, 0<j<m 




where yj,i G GF{2). 




2. 


Initialize Cj := ajbj, 0 < j < m — 1, C 


— (co , Cl , ■ • • Cm — 1 ) 


3. 


For i = l to V — 1{ 




4- 


For k = 1 to hi { 




5. 


rj ■= 0 < j < m - 1 , 


R — (ro,ri,- • • ,rm-i) 


6. 


C ~C + R 




7. 


} 




8. 


} 




9. 


If m is odd. 




10. 


s ;= h„, t := m 




11. 


else s ~ hf, t -.= 




12. 


Generate yj,v = (uj -f a((„+j)))( 6 j + &(( 


D+j))); 0 ^ J ^ ^ I 5 


13. 


Ifm is even yj+v,v = Vj,v, 0 < j < ^ - 


- 1 


u. 


For k = 1 to s { 




15. 






16. 


If m is even, 




17. 


0 < i < f - 1 , i? — 




18. 


G ~C + R 




19. 


} 





Theorem 2. For the LCNB multiplication algorithm, let and 

^Addj^(jj\ljg denote the numbers of bit level multiplications and additions, re- 
spectively. Then 

ffMulti(jj^Q = ^ (16) 

TTL 

ifAddiQ]^Q=—{CN + ‘irn-'3). (17) 



Multipliers 


#Mult 


#Add 


Total bit operations 


MO [TT] 




m(Cjv - 1 ) 


m{CN + m — 1) 


RRAIO | 8 | 




^(Civ-fm- 2 ) 


f (Cjv-f 3m-2) 


LCNB 


9 


f (Cjv -f 2m - 3) 


f (Cjv-f 3m-2) 



Table 1. Comparison of normal basis multipliers. 



Table [U compares the number of bit level operations of the LCNB algorithm 
with those of the Massey-Omura (MO) multiplier of [H] and the reduced redun- 
dancy Massey-Omura (RRJMO) multiplier of [S]. The multipliers of [TT] and [H] 
are used for comparison as they appear to be the first and the most recently 
reported work in this area, and it seems the total number of bit level operations 
of [H] is the least among the existing normal basis schemes. As it can be seen from 
the table, the total number of bit level operations of our new LCNB algorithm 
matches that of |5j. More importantly, the LCNB algorithm has the least number 
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of bit level multiplications. Since the bit level multiplication corresponds to the 
multiplication in the ground field GF(2), if the algorithm is extended to a ground 
field of degree more than one, where a multiplication is more expensive than an 
addition operation, the use of the LCNB algorithm will be advantageous. This 
is investigated in Section 5 of this article. 

In Table [1] the numbers of bit level additions (and consequently, the total 
operations) are given in terms of Cat. It is well known that Cn > 2m — 1 [7]. 
If a normal basis has minimum Cn, i.e., Cn = 2m — 1, then it is called as an 
optimal normal basis (ONB). There are two types of ONBs, namely, type-I and 
type-II which are hereafter also referred to as ONB-I and ONB-II, respectively. 
The ONBs do not exist for all m. The list in shows that only 23% of m < 2000 
have ONBs. For a given m where an ONB exists, the minimum number of bit 
level additions needed in the LCNB algorithm can be obtained by substituting 
Cn = 2m — 1 in CZD, i.e., for an ONB we have 

#AddLCNB = 2m(m - 1). (18) 

Below we show that the number of bit level additions can be further reduced by 
considering ONB-I. 



D Type-I Optimal Normal Basis Multiplication 



An ONB-I is generated by roots of an irreducible all-one polynomial (AOP) . An 
AOP of degree m has its all m -I- 1 coefficients equal to 1, i.e., 

P{x)=x"^ + x"^-^ + --- + x + l. (19) 

The AOP is irreducible if m -I- 1 is prime and 2 is primitive modulo m -I- 1 [lOj . 
Thus, the roots of (UHl) i.e., /3^ , i = 0, 1, • • • m — 1, form an ONB-I if and only 
if m -|- 1 is prime and 2 is primitive in modulo m -I- 1. Since m -I- 1 is prime, i.e., 
m is even, thus v = ^. Also, 









i=l, 2, ••• , f -1 



where ki is obtained from 

(2* -I- 1) mod (m -f 1) = 2*‘ mod (m-|-l). 
Substituting (EIHi into (1X11 . the product C can be written as 



' m— 1 



V— 1 / m— 1 






. 2 " 






i=l \ j=0 



/'v-1 



( 20 ) 



( 21 ) 



( 22 ) 



where the right most summation results in 0 or 1, and in the normal basis rep- 
resentation, 0 and 1 correspond to (0, 0, • • • , 0) and (1, 1, • • • , 1) respectively. 
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(Low Complexity ONB-I Multiplication over GF{2'^)) 

Input: A, Be GE(2™), fei, 1 < i < n, n = f 

Output: C = AB 

1. Generate yj,i = (aj + a((i+j))){bj + &((i+j))), l<i<u, 0<j<m-l, 

2. Generate yj,v = (uj + a((t,+j)))(6j + &((„+j))), 0 < j < n - 1, 

3. Initialize Cj ;= ajbj, 0 < j < m — 1, f := yo,v, f £ GF{2) 

4- For i = l to V — 1{ 

5. rj ~ yj^i, 0<j<m-l, R = (ro,ri, ■ • ■ 

6. R ■- R^'^' 

1. G — G + R 

8- f ■= f + Vr,v 

9. } 

10 . /// is 1, G:=G+(1,1,--- ,1,1) 

li- } 

Based on (1221) . now we can state an algorithm for ONB-I multiplication as fol- 
lows. 

In line 6 of the above algorithm (hereafter referred to as LCONB-I), the 
operation E? ’ can be accomplished by ki cyclic shifts. The number of bit level 
operations of lines 1, 2 and 8 are 2m{v — 1), 2v and v — 1, respectively. Also, lines 
7 and 10 need m(v — 1) and m additions. Thus, the total number of additions is 

#AddLCONB-I = l-5m^ - 0.5m - 1, (23) 

and the number of multiplications is the same as that of the LCNB algorithm 
given in 

For comparison, we consider four other ONB-I multipliers as shown in Table 
El The multiplier of m is considered to be the first such work published in 
the open literature and those of ra. 0, m are more recent work and have the 
best results among the known existing ones. As it can be seen in this table, 
although the total number of operations of the proposed LCONB-I algorithm 
is the same as those of the three best multiplication schemes, the LCONB-I 
algorithm requires the least number of bit level multiplications, which can be 
advantageous in composite finite fields as discussed below. 



Multipliers 


#Mult 


#Add 


Total bit operations 


MO [ni 




2m‘‘ — 2m 


3m^ — 2m 


Hasan et al. jl] 


V 


m'^ — 1 


2m‘‘ - 1 


Koc and Sunar [2] 




5 ? 


U 


RRJMO [S] 


V 


)5 


?? 


LCONB-I 


m(m + l) 
):> 


1.5m^ — 0.5m — 1 


?? 



Table 2. Comparison of bit level operations of ONB-I based mnltiplication schemes. 

E Composite Field Multiplication Algorithm 

In this section, we consider multiplications in the finite field GF(2™) where m 
is a composite number. These fields are referred to as composite fields. They 
have been used in the recent past to develop efficient multiplication schemes. 
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however, concerns have been raised regarding the use of such fields in elliptic 
curve crypto-systems when m has a small factor (around 3-15) [4]. 

Theorem 3. Let mi > 1, m 2 > 1 be relatively prime. Let Ni = {/3f | 0 < 
i < mi — 1} and N 2 = | 0 < j < m 2 — 1} be normal bases for GF{2'^^) and 

GF( 2 '" 2 ), respectively. Then N = {01 00 I 0 ^ ^ ~ 0 < j < — 1} is a 

normal basis for GF{2'^^^0 over GF{2). The complexity of N is Cn = Gm^Gm-^, 
where Gn^ and Gn^ are the complexities of Ni and N 2 respectively. 

Assume that m = mi ■ m 2 where mi and m 2 are as defined above. Let 
A G then A can be represented w.r.t. the basis 

N = [0^ I 0 < i < m - 1}, /3 = P 1 P 2 , 



as follows 

m— 1 77117712 — 1 

A = ^ ai/3'^' = ^ ai/30 ' 

2—0 2—0 

where a^’s are coordinates of A and 

m 2 — 1 



Aj — ^ ) aj+l.mi02 

1=0 






23 + i mi : 



7711 — 1 

E 

3=0 






(24) 



(25) 



We assume this kind of representation for any two elements: A and B € 
GF(( 2 ™ 0 ™ 0 , Le., A = B = E 710 " B.fif.whem A„ B, G 

GF(2'"2). Without loss of generality, then the product G = AB can be obtained 
from Lemma [T| as 



C = 



Epo"' [AjBifif + (pU Eaf)] , 



for mi odd 

2 . . (26) 



^ E™V [A,B,/3f + (^E:E^ E,Gf jj + E;Lo , for mi even 

where t>i = , 7 ^ = , 1 < t < ui and 

E,i - + A{i+j))')(Bj + B((i+j))), 1 < f < ui, 0 < j < mi - 1. (27) 

In (I27II . ((i+j)) = i+i mod mi and the underlying field operations are performed 
over the subfield GF(2™0- 

Also, using (I13II . one can write 7 ^ w.r.t. A^i as 



Ai) 






y* = E^i ^ ^ 



/c=l 



and similar to (H3, the product G can also be obtained as 



h(i) 



C = 



E7io AjBiPr + EEi ELLi E7io Ef,- 



„(i)^ 



3r 



EpcT^ A,B,/3r + EE7^ ELLi ( E7=V^ Y for mi even 



h(i) 



(28) 



for mi odd 



(29) 
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where 



^ 1 
2 Vi — 1 



^ = E E + ft”' 



k—1 j—0 



mi 

~Y' 



Based on we can state the following algorithm for multiplication in GF(2™) 
where m = mi ■ m2- 



(Composite Field Normal Basis Multiplication) 

Input: A, Be ^^((2™")"*!), 7 ^ £ GF(2’"i), 1 < i < m 

Output: C = AB 



1. 


■= - 


where 1 = j + mR mod mi 


2. 


Generate Yj^i := {Aj + 




where Yj^i, Aj, Bj e G 


3. 


Initialize Cj ~ AjBj, ( 


4- 


For i = 1 to ui — 1 { 


5. 


For k = 1 to { 


6. 




7. 


C — C + R 


8. 


} 


9. 


} 


10. 


If mi is odd, 


11. 


s := /ilV, t := mi 






12. 


else s := t := tf. 


13. 


Generate Yj^v = {Aj + 


14 . 


If mi is even Yj+vi,vi 


15. 


For fe = 1 to s { 


16. 


Rj ■“ ■ (1) \\ 


17. 


If mi is even, 


18. 




19. 


c ■=C + R 


20. 


} 


21. 


C[j+mil] —Cjll'], 0 



0 < j < mi — 1, R := Ro I Ri 



Cm-i —1 

I Rmi—1 



0<j<t-l 



In Algorithm 0 C in line 3 is obtained by concatenating Ci’s. R in line 6 
is obtained in a similar way. The total number of operations of the composite 
field NB (CFNB) multiplication algorithm consists of two parts: multiplications 
and additions over the subfield GF{2'^^). Using Theorem |5] the numbers of 
multiplications and additions over GF(2"*^) are +2mi — 

3), respectively. Each GF(2™^) addition can be performed by m 2 bit level (i.e., 
GF(2)) additions. If we use Algorithm Id for subfield operations, then at the 
bit level each GF(2’”^) multiplication requires multiplications and 

^(Gjv 2 + 2m2 — 3) additions. Thus, the total numbers of bit level operations 
are as follows 
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and 



#MultcpNB 



m(mi + l)(m2 + 1) 

4 



(30) 



#Addp;pjv^g = ^{Cni + 2mi — 3) • m 2 + . h^(^Cn2 + 2m,2 — 3) 

= f [Civi + 2mi - 3 + {Cn2 + 2m2 - 3)] . 

(31) 

Thus, for a given m, we can use mi < m 2 to reduce the number of addition 
operations given in m- Additionally, if m 2 +l is prime and 2 is primitive modulo 
m 2 + 1, then there exists an ONB-I over GF{2'^^) and Algorithm IDl can be used 
for GA(2’”2) multiplication. Thus, using (I23I I. the number of additions as given 
in (nm can be reduced to +2mi — 3)m2+ _ 0.57723 — 1). 

Example 1. Let m = 33, m\ = 3 and m 2 = 11- As per Table 3 of [7], there 
are ONBs for GF{2^) and GF{2^^). Thus, A^i = {Pf | 0 < j < 2} and N 2 = 
{P 2 I 0 < ^ < 10} are type-II optimal normal bases of GF{2p) and GF{2^^), 
respectively. Using Theorem N = {/3^‘ I 0 < i < 32}, where /3 = P 1 P 2 is a 
normal basis of GF{2^^) over GF{2). The complexity of N is Cn = Gn^Cn^ = 
(2 • 3 — 1)(2 -11 — 1) = 105. Any two field elements A, B G GF{2^^) can be 
written w.r.t. N as 

32 32 

A = ai0^ = AqPi + AiPi + A 2 P 1 , B = bi0^ = i?o/3i + + B 2 P 1 

Z — 0 2—0 

/ / 

where Aj = aj+siPi' , Bj = , 0 < j < 2, and l' = j + 31 

mod 11. Let G = CqPi + CiPl + C 2 P 1 be the product of A and B. Thus, using 
(ES), we have 

C= AqBqPi +{Aq + Ai){Bq + Bi)Pf 
+ A 1 B 1 PI +(Ai + A2)(i?l + B2)Pi 
+ A 2 B 2/37 +(^2 + Ao)(i ?2 + Bq)PI'^. 

Using Table 2 in [^, for the type-II ONB over GF{2^), we have Pf = Pi + Pf. 
Thus, 



C = {{AqBo -|- (Aq -I- Ai)(i?o + Bi) + (A 2 + Aq){B2 + Bd))Pi 

+((^1-®! + (^1 + A 2 )(Bi -f B 2 ) + (Aq -I- Ai)(i?o + Bi))Pi (32) 

+ ((A2B2 -|- (A 2 -|- Ao)(i?2 + Bq) -|- (Ai -|- A2)(i?l -t- B2))Pf ■ 



From d221), we see that _ 0 niultiplications and ^{Cmi + 2mi — 

3) = 12 additions over subfield GF{2^^) are needed to generate Co, Ci and C 2 . 
Thus, the total numbers of bit level multiplications and additions are 396 and 
1452, respectively. 
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Table |3] compares bit level operations for multiplication over GF(2^^) for a 
number of algorithms. Rows 2, 3 and 4, where Cn = 65, use ONB-II which exists 
for GF(2^^) over GF(2). On the other hand, rows 5, 6 and 7, where 

Cn = Cni ■ Cn 2 = 105, use the two ONB-IFs which exist for the subfields 
GF(2^) and GF(2^^) as discussed in the above example. This comparison shows 
that the proposed GFNB multiplier has the least number of bit level operations. 
More interestingly, for composite values of m, the well known optimal normal 
bases GF(2’”) over GF(2) do not seem to be the best choice when one considers 
bit level operations, which in turn determines the space complexity for hardware 
implementation of a normal basis multiplier. 



Multipliers 


Cn 


#Mult 


#Add 


Total bit operations 


MO E] 


65 


1089 


2112 


3201 


RRWIO [8] 


n 


>5 


1584 


2673 


LCNB 


n 


561 


2112 


2673 


MO lEi 


105 


1089 


3432 


4521 


RRWIO [8] 


5? 


n 


2244 


3333 


LCNB 


n 


561 


2772 


3333 


CFNB 


5? 


396 


1452 


1848 



Table 3. Comparison of operations for normal basis mnltipliers over GF{2^^). 



We wind up this section by stating the following theorem which gives the 
bit level operations for normal basis multiplication over generalized composite 
fields. 



Theorem 4. Let m = nT^i 1 < < ^2 < • • • < where gcd{mi,mj) = 

1, i yf j. Then, for a normal basis multiplication over the composite field GF(2^), 
the numbers of bit level multiplications and additions are 

n 

ifMult(jpj^Q = — W{mi + 1), (33) 

and 



ffAddfjppp 



respectively. 



n— 1 



+ 2 „„ - 3 + ^ 3 



i=i 



i=l 



(34) 



F Concluding Remarks 

In this article, efficient algorithms for normal basis multiplication over GF(2"*) 
have been proposed. It has been shown that when m is composite, the proposed 
GFNB algorithm requires significantly fewer number of bit level operations com- 
pared to other similar algorithms available in the open literature. More interest- 
ingly, it has been shown that for composite values of m, the well known optimal 
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normal bases GF(2™) over GF(2) do not seem to be the best choice when one 
considers bit level operations, which in turn determines the space complexity for 
hardware implementation of a normal basis multiplier. 

The multiplication algorithms presented in this article are targeted towards 
efficient hardware implementation. However, they can be modified to suite soft- 
ware implementation on a general purpose processor. This is being considered 
for discussions in a future article. 
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Abstract. In this paper we present a single-round, single-server sym- 
metrically private information retrieval scheme, in which privacy of user 
follows from intractability of the quadratic residuacity problem and the 
privacy of the database follows from the XOR assumption for quadratic 
residues introduced in this paper. The communication complexity of the 
proposed scheme for retrieving one bit can be made for any e > 0, 

where n is the number of bits in the database. We extend the protocol 
to a block retrieval scheme which is specially efficient when the number 
of records in the database is equal to the size of each record. 



Keywords: Private Information retrieval (PIR), Symmetrically Private In- 
formation Retrieval (SPIR), Quadratic Residuacity Assumption (QRA), Proba- 
bilistic Encryption, Cryptographic Protocols. 

A Introduction 

In the age of Internet accessing remote database is common and information 
is the most sought after and costliest commodity. In such a situation it is very 
important not only to protect information but also to protect the identity of the 
information that a user is interested in. Consider the case, when an investor wants 
to know value of a certain stock, but is reluctant to reveal the identity of that 
stock, because it may expose his future intentions with regard to that stock. The 
scheme or protocol which facilitates a user to access database and receive desired 
information without exposing the identity of information was first introduced by 
Chor et al. [5] in 1995 under the name of Private Information Retrieval. It was 
also independently studied by Cooper and Birmen [3] in context of implementing 
an anonymous messaging service for mobile user. 
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A Private Information Retrieval (PIR) scheme allows a user to retrieve bits 
from a database (DB) while ensuring that the database does not learn which bits 
were retrieved. The database is modeled as an array of bits x held by a server, and 
the user wants to retrieve the bit Xi for some i, without disclosing any information 
about i to the server. We denote number of bits (records) in database by n. The 
communication complexity (i.e., the number of bits exchanged between user and 
DB), of such a protocol is denoted by C(n). Also the exchange of information 
between the user and the DB may be interactive or non-interactive. In the first 
case the protocol is single-round while in the second case it is multi-round. 

A trivial PIR scheme consists of sending the entire database to the user, re- 
sulting in C{n) = n. Clearly, such a scheme provides information theoretic pri- 
vacy. Any PIR scheme with C(n) < n is called non trivial. Chor et al. |2] proved 
that under the requirement of information theoretic security, and involvement 
of single database in the protocol, trivial PIR is the only possible solution. A 
way to overcome this impossibility result is to consider fc > 1 servers, each hold- 
ing a copy of the database x. Chor et al. [ 2 ] presented a, k > 1 server scheme 
with communication complexity Oin'E ). This was improved by Ambainis jTj to a 
k > 2 server PIR scheme and having a communication complexity of 0{n ). 

Another way of getting non-trivial PIR schemes is to lower the privacy re- 
quirement from information theoretic privacy to computational privacy. Chor 
and Gilboa [3] presented multi-server computationally private information re- 
trieval schemes in which the privacy of user is guaranteed against the compu- 
tationally bounded servers. Kushilevitz and Ostrovsky [7] showed that under 
the notion of computational privacy one can achieve nontrivial PIR protocol 
even with a single server. In particular, |7] show that assuming the hardness of 
quadratic residuacity problem, one can get single database PIR protocol with 
communication complexity 0{rf) for any e > 0. 

While protecting the privacy of user, it is equally important that user the 
should not be allowed to learn more than the amount of information that he/she 
pays for. This is refered to as database privacy and the corresponding protocol is 
called a Symmetrically Private Information Retrieval (SPIR) protocol. However, 
the database is assumed to be honest and carries out its part of the protocol 
correctly. Gertner et. al. presented general transformations of PIR schemes 
in multi server setting satisfying certain properties into SPIR schemes, with log- 
arithmic overhead in communication complexity. Kushilevitz and Ostrovsky 
noted that in a setting of single server, their PIR protocol can be converted 
into a SPIR protocol employing zero knowledge techniques to verify the validity 
of query. The problem is that the use of zero knowledge techniques imply that 
the resulting protocol is no longer a single round protocol. Thus the question of 
getting single-server, single-round nontrivial SPIR protocol was left open. 

We provide the first solution to this problem by modifying the basic PIR 
scheme of [7] to ensure that privacy of the database is maintained. Our scheme 
introduces two new steps (preprocessing and postprocessing) in the database 
computation but does not increase the number of rounds. There is a commu- 
nication overhead in the communication from the user to the database, but in 
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the recursive scheme this is offset by the postprocessing step which effectively 
decreases the number of bits sent by the database to the user. In fact, for just 
PIR scheme the preprocessing step is not required and hence the total commu- 
nication complexity is K times less than that of (?]• The preprocessing step of 
database computation empower the DB to restrict the user to get information 
from one column of the matrix formed from the database, while the postpro- 
cessing computation prevents the user to get more than one bit from the column 
selected in preprocessing step. Thus the user is constrained to get exactly a sin- 
gle bit of information from the database for each invocation of protocol. The 
proof of user privacy is based on the intractibility of the quadratic residuacity 
problem and the proof of database privacy requires a new assumption which we 
call the XOR assumption. In the XOR assumption we assume that if x, j/ G 
and w = X ® y, then from w it is not possible to get any information about 
the quadratic character of x and y even if the user is computationally powerful 
enough to determine quadratic residuacity in . 

We extend the basic scheme into an efficient (in terms of communication 
complexity) protocol for SPIR by allowing the database to perform a depth first 
search on matrices of progressively smaller sizes. As a result we are able to prove 
the following (see | 7 ] for a similar result on PIR protocol) . 

Theorem: For every e, there exists a single-server, single-round computational 
SPIR scheme with communication complexity 0(n'^), where user privacy is based 
on QRA and database privacy is based on the XOR assumption. 

We extend the bit retrieval scheme into a block retrieval scheme which is 
specially efficient when the number of records is equal to the size of a record. 
Remark: Although we will present our schemes based on the Quadratic Resid- 
uacity Assumption, it can be based on more general assumptions. Following the 
approach of Mann [5] , we can replace Quadratic Residuacity Predicates with any 
Homomorphic Trapdoor Predicates. 



B Preliminaries 

Informally stating, a PIR scheme is a collection of three algorithms, the users 
query generation algorithm Aq, database answer computation algorithm Av 
and users reconstruction algorithm An, such that it satisfies the following re- 
quirements: 

Correctness: In every invocation of the scheme the user retrieves the desired 
bit provided the user’s query is correct. 

User Privacy: In every invocation of the scheme the server does not gain any 
information about the index of the bit that the user retrieves. 

PIR schemes can be classified into two groups on the basis of privacy they 
provide to the user. 

Information Theoretic Privacy : Given the query to server or database, he 
cannot gain any information about the index the user is interested in, regardless 
of his computational power. 
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Computational Privacy : Here the server is assumed to be computationally 
bounded, say a polynomial size circuit. Thus for computational privacy, the 
probability distribution of the queries the user sends to database when he is 
interested in itu bit, and the probability distribution of the queries when he is 
interested in bit, should be computationally indistinguishable to the server. 

A symmetrically private information retrieval (SPIR) scheme is a PIR scheme 
satisfying an additional requirement, the privacy of database. Privacy can 
again be considered to be information theoretic or computational. Informally, 
computational data privacy requires, for each execution, there exists an index i, 
such that the probability distributions on the user’s view are computationally 
indistinguishable for any two databases x, y such that Xi = yt- That is a compu- 
tationally bounded user does not receive information about more than a single 
bit of the database, no matter how he forms his query of given length. 

For the formal definitions of PIR protocols refer to Chor et. al. |^, Kushilevitz 
and Ostrovsky |7] and Mann |8|. For the formal definitions of SPIR protocols 
see Gertner et. al. [^, Crescenzo et. al. [^, Mishra [^. Also Mishra contains 
a bibliography on this and related security problems. 

C The Basic Scheme 

In this section we introduce a basic SPIR scheme for bit retrieval. We add two 
new and important steps - the preprocessing and the postprocessing steps - to 
the database computation in the protocol of 0 . The preprocessing step restricts 
user access to a particular column and the postprocessing step allows the user 
to get only a single bit from the column. This scheme in itself is not efficient, 
but it makes the main ideas clear which will be used in the efficient bit retrieval 
scheme based on a recursive database computation. In the following, by a QNR 
we will mean a quadratic non residue whose Jacobi symbol is 1. Also by we 
denote the set of all elements in whose Jacobi symbol is 1. 

For clarity, we use DB to denote server or database program which handles 
the database. We view the n-bit database string x to be an (s x t) matrix of bits 
denoted by D. The user is interested in retrieving the ith bit Xi of the database 
X, which is the (a,b)th entry in matrix D, where, (a, 6) = getlndex(i,t). The 
algorithm getlndex{,) is defined as follows. 
getlndex{i,t) { 

if t I i, then = f and t2 = t 
else t\ = Lf J + 1 t2 = i mod t. 
return (^1,^2)- 

} 

Query Generation (by user) : 

1. User generates two -y bit prime numbers p and g, such that 
p = q = S (mod 4). Here K is the security parameter. 

2. User chooses t numbers at random yi,. . . ,yt G Z'^, such that t/t, is a QNR 
and yj {j 7^ &) is a QR. 

3. User chooses s numbers at random 71,. . . ,7s G Z’^, such that 7a is a QNR 
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and (j a) is a QR. 

4. User sends N,yi, . . . ,yt and 71 , . . . , 7^ to DB. The factorization of N is kept 
secret. 

Database Computation : 

1. Preprocessing (Column Control): For each bit D{t,t) of matrix D, a, K 

bit number is randomly chosen as follows. If D(t,t) = 0 then tl){r,T) is 

a QR and if D{r,T) = 1, then ip{r,T) is a QNR. Denote by tp{r,T,K) the Kth 
bit of r). DB now forms K matrices of size s x t, :/t = 1,2, . . . ,RT as 
follows: Df^^ryT) = ip{r,T,K) Thus for the original matrix D, DB has formed 
K matrices D^ of same size. The database can always generate random QR’s 
without knowing the factorization of N. The primes p, q are chosen to he p,q = 
3 mod 4. This allows the DB to also randomly generate QNR’s in 

2. For each of the K matrices DB computes for every row r of a number 

z(r,K) as follows: z{r,K) = Vr where Vr is a randomly chosen 

QR by DB. Thus, DB computes s x K, numbers z(r, n) where each z{r, k) is a 
K-hit number. For, 1 < c < RT, denote by z{r, k, c) the Cth bit of z(r, k). 

3. Post Processing (Row Control): DB forms K matrices 1 < k < K, such 

that Af^{r, c) = z{c, k, r). Now, for each of the K matrices (1 < k < K), the 

database DB, computes for every row r (1 < r < K) a number ((r, k) as follows : 
({r,K) = Vr -1(7;)'^''^’’’^^) where Vr is a randomly chosen QR by DB. Thus for 
each of the K matrices A^., DB computes K numbers C(l, k),C(2, k),. . . ,((s, k), 
where each of these numbers in itself a RT-bit number. DB sends these 
numbers (a total of RT^ bits) to user. 

Bit Reconstruction : User retrieves the desired bit as follows: 

1. Observe that ((r, k) is a QR iff A^{r, a) is 0 (see LemmaHJ). Thus determin- 
ing the quadratic character of the RT^ numbers, ((r,K), gives the user the bits 
A,,(l,a),A,,(2,a), . . .,A,,(K,a). 

2. From the construction of matrix A^^, we see that, z(a,K,r) = A,^(r,a), 
and further z{a, k) = z{a, k, 1), . . . , z{a, k, K) for all 1 < k < RT . Thus for all 
1 < K < RT, user gets z{a, n). 

3. The quantity z{a,n) is a QR iff DK.{a,b) is 0 (see Lemma [T|. Thus by de- 
termining the quadratic characters of K numbers z{a,n) (1 < k < K), user 
gets the bits Di{a, b),. . . , Dk{ci, b). ^From the construction of matrices it 
is clear that Df^{a, b) = i({a, b, k), and further 7/^(0, b) = (a, 6 , 1), ... , i({a, b, K). 
Using LemmalU b) is a QR iff D{a, b) is 0. Thus user gets the bit D{a, b). 
Remark: The security parameter K must satisfy K > maxjR'o, poly{logn)} 
where n is the number of bits in database, Kq is the smallest number such that 
encryption scheme under consideration ( encryption by QR’s and QNR’s here) 
is secure. The poly{logn) factor comes because, DB is assumed to be resourceful 
enough to do a computation of 0(n). 

Correctness of the protocol follows from the following Lemma which is not 
difficult to prove. 

Lemma 1. Let x = [xi,X 2 , ... ,Xt] be a bit array of length t and let Xh ■ ■ ■ iXt 
be chosen such that xi, ■ ■ ■ ,Xi-i^Xi+iT ■ ■ Nt are QR’s and Xi is a QNR. Let 
y = Y)Li=iiXiT’’ ■ Then y is a QR iff xt = 0. 
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Privacy of User : Suppose there exists a family of polynomial time circuits 
Cn that can distinguish between two query distributions Qi and Qi' (for two 
indices i and i' of the database) with probability larger than for some integer £. 
Then following we can construct another family of polynomial time circuit C'^ 
from Cn, which will, on input N and y G compute the quadratic residuacity 
predicate with probability at least ^ 

Privacy of database for Honest but Curious User is easy to prove and so 
here we consider the case of a Dishonest User A dishonest user can deviate from 
the protocol to possibly gain any extra information in the following ways: 

1. N is a product of more than two primes. It is not clear that the user can 
gain extra information by using such N. Hence we will assume that N is indeed 
a product of two primes. 

2. Assuming that N is a product of two primes, the numbers that the user 
sends to DB must be in This is necessary since the DB can perform this 
computation and reject a query not confirming to this specification. Hence the 
only way a dishonest user can cheat is to send more than one QNR’s in each 
query set. We now argue that this provides the user with no information at all, 
i.e., even if one query set has two QNR’s then the user does not get even one 
bit of the database. 

Note that even if the user chooses the QR’s and QNR’s with some ’’special” 
properties, this will not help since in the computations of z(r, k) and C(r, k), the 
multiplication by a randomly chosen will destroy these properties. Similar to 
Lemma [U we have 

Lemma 2. Let x\, . . . ,Xt he in QR^} QNR and b\, . . . ,bt be a bit string and a 
number z is computed as: z Suppose Xi j , . . . , Xi^ are QNR ’s and 

rest are QR’s, then z is a QR ijf bi-, 0 . . . © =0. 

The problem is establishing database security is that we cannot base it on QRA, 
since the user is capable of determining quadratic residuacity in Z^^ . Indeed 
the user is required to determine quadratic residues for the protocol to work. 
However, we will see later that if the user sends more than one QNR’s in his query 
set then he receives an element z in Zjq, which is the XOR of some randomly 
chosen elements x\,. . . ,Xt in Z^'’ . We would like to argue that from z the user 
gets no information about the individual quadratic characters of any of the Xi ’s 
even though he knows the factorization of N . We make this requirement more 
formal in the following manner. 

XOR Assumption : Let z = Xi © ... © Xt, where Xi,...,Xt G Let 

X = {x\, . . . ,Xt\ and A be an arbitrary subset of X. Then we assume that for 
N sufficiently large Prob{A C QR, X — AC QNR | z) G — 5{K), ^ + <5(AT)], 
where K is the number of bits required to represent N and 5{K) goes to zero as 
K increases. 

A formal proof of the XOR assumption looks difficult to obtain. Some exper- 
iments were conducted to verify the XOR assumption for t = 2 and small values 
of N . We briefly mention the important observations. 

1. From simulation it is observed that 5{K) depends on ^~jUi — ■ As this ratio 
increases from 0 to an upper bound of p{K) > 0, the value of 5{K) falls rapidly. 
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Further the upper bound rj{K) decreases exponentially with increasing K. 

2. The nearer the ratio - of two prime factors of iV is to 1, the smaller is the 
value S{K). 

3. XOR Assumption can be generalized to any probabilistic encryption scheme, 
there are some supportive evidences which we are unable to present here for the 
lack of space. 

We consider three cases which can occur from the possible nature of query 
sent by user: 

First set contains more than one QNR : Let the first set in the query sent 
by user contain p many QNR’s at positions bi, . . . , bp. Then Lemma[^ implies 
that in the reconstruction phase a number z{a,K,), (1 < k < K) obtained by 
user is a QR iff Dn{a, 6i) 0 . . . © D^{a, bp) = 0. Thus user is able to reconstruct 
6i) © . . . © ip{a, bp). 

Second set contains more than one QNR : Let the second set of numbers in 
the query sent by user contains q many QNR’s at positions oi, 02 , . . . , a,. Again 
using Lemma 12 in post processing computation, we see that a number (^(r,K) 
received by user is a QR iff A^(r, oi)©. . .© Og) = 0, (1 < r < K), and (1 < 
K < K). Thus user will be able to reconstruct z(ai, k) © z{a 2 , k) © ... © z{am, k). 
Both sets contain more than one QNR : Let first set contain more than one 
QNR’s at positions bi,. .., bp and the second set contain QNR’s at positions 
oi, 02 , . . . , Og. Then a number ^(r, k) received by user is a QR iff Ak(?’, oi) © 

. . . © Af^{r,aq) = 0, (1 < r < K), and (1 < k < K). Thus user will be able to 
reconstruct z(ai,K) © z(a 2 , k) © ... © z{aq, k) (1 < k < K). 

Thus in all the three cases user will be able to reconstruct only XOR’s of more 
than one numbers, and the XOR assumption says that from the XOR of two or 
more numbers from the set it is not possible to know anything about the 
quadratic characters of the constituent numbers. Hence if user sends more than 
one QNR’s in any set of numbers in his query, he fails to get even one bit of the 
database x. 

Communication complexity : Total communication from user to database 
DB in this scheme is (1 + < + s) K-bit numbers {N, yi, . . . . . . , %)■ while 

database returns AT-bit numbers C(^,k) < r < K,1 < k < K) ob- 

tained after the postprocessing to user. Thus the communication complexity 
is (l + t + s + AT^) • AT bit, which can be minimized by choosing t = s = ^/n, and 
the communication complexity is: (2y^+l + AT^)-Ar. Under similar assumptions 
on the security parameter Kushilevitz and Ostrovsky (2 obtained the communi- 
cation complexity (2 • ySi+1) ■ K for their basic PIR scheme. A closer look reveals 
that, with an extra communication of bit over the basic PIR scheme pre- 
sented by Kushilevitz and Ostrovsky [7], we have successfully obtained a SPIR 
scheme. Even with the weaker assumption on security parameter, i.e.; K = N 
for some constant e > 0, we get a communication complexity 0 (n 2 +'^), provided 
e < Thus we have proved the following theorem : 

Theorem 1. For every i > e > 0, there exists a single-server, single-round 
SPIR protocol with communication complexity 0 ( 77 , 2 +*^) where user privacy is 
based on QRA and the database privacy is based on the XOR assumption. 
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D Iterative Bit SPIR Scheme 



In this section we develop an improved scheme using the ideas developed in our 
basic scheme and we manage to bring down the communication complexity. We 
essentially achieve this by allowing the DB to do a recursive computation (see 
also 0 )- We put stress on the fact that the scheme involves only a single round 
of communication and security of database as well as user remain preserved. 

As before, we view the n-bit database string a: to be an (s x t) matrix of bits 
denoted by D. The ith bit Xi of the database x is {a\,hi)th entry in matrix £>, 
where (ai, 6 i) = getlndex{i,t). 

Query Generation : 

1 . User generates two -y bit prime number p and q with p,q = S mod 4. Calculate 
N = pq. 

2 . User calculates t such that = n, where L is the depth of recursion of 
the database computation. The value of L is chosen such that communication 
complexity is minimized (as we will see later). 

3. User generates a sequence of the pair of indices {aj,bj) as follows. 

for j ^ 1 to L — 1 {(oj+i, bj+i) = getlndex{aj,t)} 

The pair (a^, bj) correspond to the row and column index of the relevant bit in 
the matrices in jth round of DB computation. Also define Sj = ~ and tj = t 
for 1 < j < A. (sj, tj) are number of rows and columns in each matrix in the jth 
round of DB computation. 

4. User generates an L x t matrix y, where for l<cr<A, !</3<t: 

y{a, (}) is a QNR if /3 = b„, else it is a QR. Clearly each row of y contains exactly 
one QNR. 

5. User randomly chooses sl numbers 71 ,. . . , 7 sj^ S Z*j^, such that 7 a^ is a QNR 
and 7 j (j yf oi,) is a QR. 

6 . User sends N, the matrix y and the numbers to the DB. The 

factorization of N is kept secret. 

Database Computation : 

Database DB performs a A + 2 round of computation in three phases. First 
phase is the preprocessing round of basic scheme, while the third phase is the 
post processing round. The middle step is an A round recursive computation. 

1 . Pre Processing (Column Control) : As in the basic scheme DB, forms 
K matrices D,^{a,j3) from the original matrix D{a,(3). Again the user requires 
exactly one bit from each of the matrices DCs. 

2. Iterative Computation: The database DB performs a A round recursive 
computation according to following algorithm: 

For each D^ (1 < k < K) formed in the preprocessing round perform the call 
DFSCompute{DK., s, t, j/i, 1). 

The algorithm DFSCompute is described below: 

D F SCompute{M , s, t, yi, l){ 

/* • yi is the kh row of the matrix of numbers sent by user. Note yi[i] = y{l,i) 
is a QR ii i C bi and yi[i] is a QNR iii = bi. 

• M is an s X < matrix of bits and we want to retrieve the bit M\ai, bi\. 
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• I denotes the level of the recursion. */ 

1. Set for 1 < i < s, z[i] = ’■11^ where Vi for each i is & QR 

chosen by DB uniformly at random. 

/* Each z[i] is a K-hii string. For I < j < K, let z[i,j] denote the jth bit of z[i\. 
We require the string z[ai] */ 

2. For 1 < j < K , form K matrices Mj, where each Mj is an | x t matrix formed 
from the column vector, z[*,j] = z[l,j ], . . . , z[s,j] by breaking z[*,j] into t-bit 
blocks and arranging the blocks in a row wise fashion. 

/* The string z[ai] is distributed over the K matrices Mj, i.e., the string z[ai] is 
equal to Mi[ai+i,bi+i], . . . , MK[ai+i,bi+i], where (ai+i,bi+i) = getlndex{ai,t). 

V 

3. /or(l <j< K){ 

if {I < L — 1) DFSCompute{Mj, t, yi+i, I + 1) 
else PostCompute{Mj, 

} 

} 

The routine PostCompute{-) is the postprocessing step and is described below: 
3. Post Processing (Row Control): 

Paste ompute{M, s,t,y,^){ 

/* • As in DFSCompute M is an s x t matrix of bits and we want to retrieve 
the bit M[a,h], where the index a and h is hidden in the y and 7. 

• y[j] (1 < J ^ i) is an array of t numbers {Lth row of the matrix y sent by 
user). y[j] is a QNR ii j = h else it is a QR. 

• l[j] (1 < J < s) is an array of s numbers (7 sent by user), ^[j] is a QNR 
if j = a else it is a QR. */ 

1. Set for 1 < i < s, z[i] = Vi ) where Vi for each f is a QR 

chosen by DB uniformly at random. 

/* Each z[i] is a Al-bit string. For 1 < j < Al, let z[i,j] denote the jth bit of z[i\. 
We require the string z[a] */ 

2. Set M'[i,j] = z[j,i] for 1 < j < s, 1 < i < AT. 

/* M' is an AT X s matrix of bits. */ 

3. Set for 1 < i < AT, = Vi ^n^=i(7[j])^ where Vi for each t is a QR 

chosen by DB uniformly at random. 

4. Output the strings C[l], ■ • ■ , C,[K]. These are sent to the user. 

} 

Reconstruction Phase and Correctness : 

We show that from the output of PostCompute{j, it is possible to reconstruct 
the {ai,bi)th bit of matrix D i.e., ith bit of database x. This will establish 
the correctness of protocol and also provide the method using which user can 
reconstruct the ith bit of database x. (We assume that the user’s query is properly 
formed.) 

1. Suppose the call PostCompute{M, s, t, y, 7) outputs the strings C[l], . . . , ([K] 
and the QNR^s of y and 7 are the bth and ath position respectively. The value 
of C[*] is a QR iff M'[i,a] is 0, so it is possible to reconstruct the column vector 




234 Sanjeev Kumar Mishra and Palash Sarkar 



M'[*,a] which is equal to the row vector z[a, *] = z\a]. Again z\a] is a QR iff 
M[a,h] is 0. Thus it is possible to find M[a,b]. So it is possible to get the bits 
at level L — 1. Now we use backward induction on the depth of recursion. 

2. Suppose the call DFSCompute{M,s,t,yi,l) produces the set of matrices 
Ml, . . . , Mk- By induction hypothesis it is possible to get the bits Mi[a;+i, 6;+i]. 
We show it is possible to get M\au bi] from these. From the construction of z\ai] 
we find that it is equal to Mi[ai+i, 6;+i], . . . , Mk[o.i+i, ^i+i] so it is possible 
to get z[ai]. The quantity z[ai] is a QR iff M[ai,bi] is 0. Thus we get the bit 
M[ai,bi], This proves the induction. Hence the user can get the {ai,bi)th bit 
of all the K matrices passed in the routines DFSCompute{M,s,t,yi,l). 
Thus user gets the bits Di[ai, 6i], . . . , DkIo-i, &i], which on concatenation gives 
the number ^i) by which DB had replaced the (oi, bi)th bit of matrix D. 
Again ip{ai,hi) is a QR iff D{a\, hi) is 0. Thus user is able to obtain the desired 
bit. 

Privacy of user and database can be shown similarly as in the case of basic 
scheme. We omit the details due to lack of space. 

Communication Complexity 

Communication from user to DB is (1) a AT-bit number N, (2) a L x t query 
matrix y of AT-bit numbers and (3) an array of AT-bit numbers of length t. Thus 
total communication from user to database DB in this scheme is (1+(A+1) -t) • AT. 
The DB returns numbers computed at the end of PostCompute{-) routine. We 
analyze the tree structure formed from the computation process of DB. The 
Root of the computation tree is the matrix D formed from original database x. 
Now in preprocessing computation DB obtains K matrices D^2s (1 < k < AT) 
from the matrix D. Each of these AT matrices becomes the child of the root. 
Thus root node, designated at level 0 has K children (all at level 1). The call 
of routine DFSCompute{-) at Ith level of recursion takes a matrix at I level as 
input and produces K matrices as output. Thus each of the nodes at level I < L 
has K children. Matrices at level L are input to PostCompute{-) which produces 
K numbers of AT-bit each, which are returned to user. Thus for each of the 
matrices (leaf nodes of computation tree), user receives bits. Therefore the 
total communication from DB to user is AT^+^ bits. Hence the communication 
complexity C{n) = (1 + (A+1) • AT bits, where t = . If we choose, 

L = - 1, then C{n) = (1 + ^ + 1) • • AT. Compare this 

with the communication complexity 0(2^ '' obtained by Kushilevitz 
and Ostrovsky [Zj for their PIR scheme. Thus we have a single-round SPIR 
scheme with the communication complexity even smaller than the PIR scheme 

of[H- 

Even with the weaker assumption on security parameter, i.e., K = for 
some constant e > 0, we get a communication complexity 0{^ ■ = 

which is better than the communication complexity obtained 

in [ 7 ] under the same assumption. If we take K = log'^ n, then we get the com- 
munication complexity of 0(^ TT ^^iogn ' n ■ 

Hence we have proved the following theorem : 
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Theorem 2. For every e > 0, there exists a single-server single-round SPIR 
protocol with communication complexity 0(n'^), where user privacy is based on 
QRA and database privacy is based on the XOR assumption. 

E Block Retrieval SPIR Scheme 

In previous section we presented scheme for retrieving a bit from a database 
modeled as a array of bits. But a more realistic view of a database is to assume 
it partitioned into blocks rather than bits. We view database a: as a array of 
records, each of size m, having n records in total. User wants to retrieve ith 
record. Number of records n and number of bits in a record m determine L, 
as L = — 1. The value of L determine the recursion depth of database 

computation. 

For the purpose of DB computation database x held by DB is viewed as a 
stack of m matrices (1 < /a < to), where each matrix is an s x t matrix 
of bits and user wants to retrieve the bits I?^(ai,&i). Now to retrieve the ith 
record from database x, user generates a query following the protocol in our bit 
retrieval scheme, but taking the value of L defined as above. DB applies the 
query sent by user on all the to matrices in the stack, and send back to user the 
answer obtained for each of the matrices in the stack. As user can obtain ith bit 
of each of the matrix, he will get the ith record. Correctness and privacy of user 
and privacy of database follows from the underlying bit retrieval protocol. 

The Communication Complexity in this scheme is C(to, n) = (1 + (L + 1) • 
nT+T _|_ 771 . • K Therefore, we proved following theorem: 

Theorem 3. There exist a block retrieval SPIR protocol with communication 
complexity linear in number of bits in the record to and polynomial in security 
parameter K, i.e., we have 0{m ■ where to is the number of bits in the 

record and L — r /°g^ 1 — 1, where user privacy is based on QRA and database 
privacy is based on the XOR assumption. 

Corollary 1. For n < m, i.e., number of records not more than number of bits 
in any record, we get L = 0, and communication complexity: C = (1 + n + 
TO • K) ■ K < m ■ {K + K)K i.e., C = 0{mK^). For m < n < mf), we get 
L = 1 and communication complexity C = 0{m ■ K^). In general, n~ = to, and 
(L + 1) < K^L + 1), thus communication complexity C = 0{m ■ 

F Conclusion 

In this paper we have presented a single-server, single-round SPIR protocol. 
The communication complexity of the protocol can be made 0{n^) for any e > 
0. Further the scheme has been extended to efficient block retrieval protocols. 
Some of the ideas used in the construction of SPIR protocol is based on the 
PIR protocol in jT]. In Mishra [S], it is shown that there exists PIR and SPIR 
schemes having communication complexity 0{K\ogn) (where K is the security 
parameter and n is the size of the database) provided there exists probabilistic 
encryption schemes with certain desirable properties. 
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Abstract. This paper proposes three key agreement protocols that em- 
phasize their security and performance. First, the two-pass authenticated 
key agreement (AK) protocol Is presented in the asymmetric setting, 
which is based on Diffie-Hellman key agreement working over an ellip- 
tic curve group and provides more desirable security attributes than the 
MTI/AO, two-pass Unified Model and two-pass MQV protocols. Other 
two protocols are modihcations of this protocol: the three-pass authenti- 
cated key agreement with key confirmation (AKC) protocol which uses 
message authentication code (MAC) algorithms for key confirmation, 
and the two-pass authenticated key agreement protocol with unilateral 
key conhrmation which uses the MAC and the signature. 



A Introduction 

Key establishment protocol is a process whereby a shared secret key becomes 
available to participating entities, for subsequent cryptographic use. It is broadly 
subdivided into key transport and key agreement protocoljTS]. In key transport 
protocols, a key is created by one entity and securely transmitted to the other 
entity. In key agreement protocols, both entities contribute information to gen- 
erate the shared secret key. Key establishment protocol employs symmetric or 
asymmetric key cryptography. A symmetric cryptographic system is a system 
involving two transformations - one for the initiator and one for the responder - 
both of which make use of the same secret key [18] . In this system the two entities 
previously possess common secret information, so the key management problem 
is a crucial issue. An asymmetric cryptographic system is a system involving 
two related transformations - one defined by a public key (the public transfor- 
mation), and another defined by a private key (the private transformation) - 
with the property that it is computationally infeasible to determine the private 
transformation from the public transformation [IB]. In this system the two enti- 
ties share only public information that has been authenticated [B]. This paper is 
concerned with two-entity authenticated key agreement protocol working over 
an elliptic curve group in the asymmetric (public-key) setting. 

The goals of any authenticated key establishment protocol is to establish 
keying data. Ideally, the established key should have precisely the same attributes 
as a key established face-to-face. However, it is not an easy task to identify 
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the precise security requirements of authenticated key establishment. Several 
concrete security and performance attributes have been identified as desirable [4] . 

The fundamental security goals of key establishment protocols are said to 
be implicit key authentication and explicit key authentication. Let A and S be 
two honest entities, i.e., legitimate entities who execute the steps of a protocol 
correctly. Informally speaking, a key agreement protocol is said to provide im- 
plicit key authentication (IK A) (of B to A) if entity A is assured that no other 
entity aside from a specifically identified second entity B can possibly learn the 
value of a particular secret key. A key agreement protocol which provides im- 
plicit key authentication to both participating entities is called an authenticated 
key agreement (AK) protocol. A key agreement protocol is said to provide key 
confirmation (of B to A) if entity A is assured that the second entity B actually 
has possession of a particular secret key. If both implicit key authentication and 
key confirmation (of B to A) are provided, the key establishment protocol is 
said to provide explicit key authentication (EKA) (of B to A). A key agreement 
protocol which provides explicit key authentication to both entities is called an 
authenticated key agreement with key confirmation (AKC) protocol l4ll4ll8l . 

Desirable security attributes of AK and AKC protocols are known-key se- 
curity (a protocol should still achieve its goal in the face of an adversary who 
has learned some other session keys - unique secret keys which each run of a 
key agreement protocol between A and B should produce.), forward secrecy (if 
static private keys of one or more entities are compromised, the secrecy of previ- 
ous session keys is not affected.), key-compromise impersonation (when A’s static 
private key is compromised, it may be desirable that this event does not enable 
an adversary to impersonate other entities to A.), unknown key-share (entity B 
cannot be coerced into sharing a key with entity A without B’s knowledge, z. e., 
when B believes the key is shared with some entity C A, and A correctly be- 
lieves the key is shared with B.), etc 141 141181 . Desirable performance attributes of 
AK and AKC protocols include a minimal number of passes, low communication 
overhead, low computation overhead and possibility of precomputations |1]. 

This paper firstly proposes a new two-pass AK protocol in the asymmet- 
ric setting, which is based on Difhe-Hellman key agreement working over an 
elliptic curve group. The protocol provides more desirable security attributes 
than the MTI/AO, Unified Model and MQV AK protocols. Secondly, an AKC 
protocol from the AK protocol is derived by using the Message Authentication 
Code (MAC) algorithms. Finally, a two-pass unilateral AKC protocol is designed 
to provide mutual IKA and unilateral EKA, and known-key security, forward 
secrecy, key-compromise impersonation, and unknown key-share attributes dis- 
cussed in m- 

The remaining of this paper is organized as follows. Section 2 reviews AK pro- 
tocols like the MTI/AO, Unified Model and MQV protocols. Section 3 presents 
the new AK protocol and its two variants; AKC protocol and two-pass unilat- 
eral AKC protocol. Section 4 compares key agreement protocols suggested in 
Sections 2 and 3. Finally, Section 5 makes concluding remarks. 
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B AK Protocols 

This section describes the MTI/AO, two-pass Unified Model, and two-pass MQV 
protocols. The security of AK protocols in this paper is based on the Diffie- 
Hellman ‘problem, \\ flj in elliptic curve group: given an elliptic curve E defined 
over a finite field F^, a base point P G E{Fg) of order n and two points generated 
by P, xP and yP (where x and y are integer), find xyP. This problem is closely 
related to the well-known elliptic curve discrete logarithm problem (ECDLP) 
(given E{¥q),P,n and xP, find x)|7] and there is strong evidence that the two 
problems are computationally equivalent {e.g., see [5| and US]). 

All protocols in this paper have been described in the setting of the group of 
points on an elliptic curve defined over a finite field. The following abbreviations 
are used for clear understanding: IKA denotes implicit key authentication, EKA 
explicit key authentication, K-KS known-key security, FS forward secrecy, K-CI 
key-compromise impersonation, and UK-S unknown key-share^. 

We first present the elliptic curve parameters that are common to both en- 
tities involved in the protocols (he., the domain parameters),, and the key pairs 
of each entity. These are the same one used in |14| . 

Domain Parameter 

The domain parameters consist of a suitably chosen elliptic curve E defined 
over a finite field of characteristic p, and a base point P G F(F„)[TT|. 

1. a field size q, where q is a prime power (in practice, either q = p, a,n old 
prime, or q = 2"*); 

2. an indication FR (field representation) of the representation used for the 
elements of F^; 

3. two field elements a and b in F^ which define the equation of the elliptic 
curve E over F^; 

4. two field elements Xp and t/p in F^ which define a finite point P = {Xp,yp) 
of prime order in EiFq)] 

5. the order n of the point P; and 

6. the cofactor c = #P(Fg)/n. 

A set of domain parameters {q, ER,a,b, P,n, c) can be verified to meet the 
above requirements (see m for more details). This process is called domain 
parameter validation. 

Key Pairs 

Given a valid set of domain parameters (g, PR, a, b, P, n, c), an entity’s private 
key is an integer d selected at random from the interval [1, n — 1], and its public 
key is the elliptic curve point Q = dP. The key pair is (Q, d). For the protocols 
described in this paper, each entity has two key pairs: a static or long-term key 
pair (which is bound to the entity for a certain period of time), and ephemeral 
or short-term key pair (which is generated for each run of the protocol) [Tl] . 
A’s static key pair and ephemeral key pair are denoted {Wa,wa) and (RA,rA) 
respectively. 
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For the remainder of this paper, we will assume that static public keys are 
exchanged via certificates. Cert a denotes A’s public-key certificate, containing a 
string of information that uniquely identifies A (such as A’s name and address), 
her static public key Wa, and a certifying authority (CA)’s signature over this 
information. To avoid a potential unknown key-share attack, the CA should 
verify that A possesses the private key wa corresponding to her static public 
key Wfy[H|. 

Public-Key Validation 

Before using an entity’s purported public key Q, it is prudent to verify that it 
possesses the supposed arithmetic properties - namely that Q be a finite point in 
the subgroup generated by PM- This process is called public-key validation\12\. 
A purported public key Q = {xQ^yq) can be validated by verifying that: 

1 . Q is not equal to O which denotes a point at infinity; 

2. XQ and yq are elements in the field F^; 

3. Q satisfies the defining equation of A; and 

4. nQ = 0. 

The computationally expensive operation in public-key validation is the scalar 
multiplication in step 4. To reduce this burden, step 4 can be omitted during 
key validation. It is called embedded public-key validation m- For ease of expla- 
nation, public key validation is omitted from the descriptions of protocols of this 
section. 



B.l MTI/AO 

The MTI/AO key agreement protocol was proposed by Matsumoto et al. in 
1986[T7|. It was designed to provide implicit key authentication. The similar 
protocols are KEA[inj and those proposed by Goss[TT] and YacobipO]. 

Protocol 1 

1. A generates a random integer va, 1 < ta 
Ra = xaP, and sends {Ra, Cert a) to B. 

2. B generates a random integer rs, 1 < rs 
Rb = ^bP, and sends {Rb, Certs) to A. 

3. A computes K = vaWb + waRb- 

4. B computes K = rsWA + wbRa- 

5. The session key is k = kdf{K) (where kdf is 

Remark 

— A and B commonly compute K = {vawb + 'Tbwa)P- 

— It does not provide FS since an adversary who learns wa and wb can compute 
all session keys established by A and B. 



< n — 1, computes the point 

< n — 1, computes the point 



a key derivation function) . 
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B.2 Two-Pass Unified Model 

The Unified Model proposed by Ankney et al. [T] is an AK protocol in the draft 
standards ANSI X9.42|2Ij, ANSI X9.63|22|, and IEEE PI363[2B]- In the protocol, 
the notation 1 1 denotes concatenation. 

Protocol 2 

1. A generates a random integer va, 1 < fA < n — 1, computes the point 
Ra = taP, and sends (i?^, Cert a) to B. 

2. B generates a random integer tb, 1 < tb < n— 1, computes the point 
Rb = tbP, and sends (RB,CertB) to A. 

3. A computes Zg = waWb and = taRb 

4. B computes Zg = wbWa and Zg = tbRa 

5. The session key is k = kdf{Zg\\Ze). 

Remark 

— A and B commonly compute K = {wAWBP)\\{rAi’BP)- 

— It does not possess the K-CI attribute, since an adversary who learns wa 
can impersonate any other entity B to A. 

B.3 Two-Pass MQV 

The MQV|14] is proposed by Law et al., which is in the draft standards ANSI 
X9.42|2T], ANSI X9.63|22], and IEEE PI363[23]. 

The following notations are used. / denotes the bitlength of n, the prime 
order of the base point P; i.e., f = [log 2 n\ -I- 1. If Q is a finite elliptic curve 

point, then Q is defined as follows. Let x be the ^-coordinate of Q, and let x be 

the integer obtained from the binary representation of x. (The value of x will 
depend on the representation chosen for the elements of the field F^). Then Q is 
defined to be the integer (x mod + . Observe that {Q mod n) yf 0[1]. 

Protocol 3 

1. A generates a random integer va, 1 < rA < n — 1, computes the point 
Ra = taP, and sends {Ra, Cert a) to B. 

2. B generates a random integer r^, 1 < < n— 1, computes the point 

Rb = tbP, and sends (RB,CertB) to A. 

3. A computes sa = ta + Raw a mod n, K = sa{Rb + RbWb) 

4. B computes sb = rB + RbWb mod n, K = sb{Ra + RaW a) 

5. The session key is k = kdf{K). 

Remark 

— A and P commonly compute AT = {sasb)P = {rAf'B+'rAWBRB+rBWARA+ 
wawbRaRb)P- The expression for Ra uses only half the bits of the x- 
coordinate of Ra- This was done in order to increase the efficiency of com- 
puting K because the scalar multiplication RaW a can be done in half the 
time of a full scalar multiplication [H| . 

— It does not provide UK-S attribute, which has recently been observed by 
Kaliski|13|. 
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C New Key Agreement Protocols 

This section proposes the new two-pass AK, AKC and two-pass unilateral AKC 
protocols. Domain parameters and static keys are set up and validated as de- 
scribed in Section 2. 

C.l AK Protocol 

Protocol 4 provides more desirable security attributes than AK protocols 
described in Section 2. 

Protocol 4 

If A and B do not previously possess authentic copies of each other’s static 
public keys, the certificates should be included in the fiowsjHj. 

1. A generates a random integer 1 < < n — 1, computes the point 

Ba = taP, and sends this to B. 

2. B generates a random integer tb, 1 < tb < n— 1, computes the point 
Rb = tbP, and sends this to A. 

3. A does an embedded key validation of Rb- If the validation fails, then 
A terminates the protocol run with failure. Otherwise, A computes K = 
ctaWb + c{wA + r a) Rb- If AT = O, then A terminates the protocol run with 
failure. 

4. B does an embedded key validation of Ra- If the validation fails, then 
B terminates the protocol run with failure. Otherwise, B computes K = 
crBWA + c{wB + rB)RA- If AT = O, then B terminates the protocol run with 
failure. 

5. The session key is k = kdf{K). 

Multiplication by c ensures that the shared secret AT is a point in the subgroup 
of order n in E{Fq) to protect against small subgroup attack as suggested in 
C3- The check K = O ensures that AT is a finite point [Tl]. 

A and B computes the shared secret AT = c{vawb + tbwa + tatb)P- 

Security 

Although the security of Protocol 4 has not been formally proven in a model 
of distributed computing [S], heuristic arguments suggest that Protocol 4 pro- 
vides mutual IKA[I1]; entity A is assured that no other entity aside from B can 
possibly learn the value of a particular secret key, since the secret key can be 
computed only by entity who knows B's private keys, tb and wb, vice versa. 
In addition. Protocol 4 also appears to have many desirable security attributes 
listed in Section 1 as follows. 

The protocol provides known-key security. Each run of the protocol between 
two entities A and B should produce a unique session key which depends on 
VA and Tb- Although an adversary has learned some other session keys, he 
can’t compute tawbP-, tawaP and vatbP from them, because he doesn’t know 
ephemeral private keys ta and tb- Therefore the protocol still achieve its goal 
in the face of the adversary. 
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It also possesses forward secrecy provided that EKA of all session keys is 
supplied. Suppose that static private keys wa and wb of two entities are com- 
promised. However, the secrecy of previous session keys established by honest 
entities is not affected, because it is difficult by the elliptic curve Diffie-Hellman 
problem that an adversary gains rArsP of the shared secret from and rsP 
which he can know. 

It resists key-compromise impersonation. Though A’s static private key wa is 
disclosed, this loss does not enable an adversary to impersonate other entity B to 
A, since the adversary still faces the elliptic curve Diffie-Hellman style problem 
of working out rAWsP from wa, rs, rAP, rsP, waP and wbP to learn the 
shared secret. 

It also prevents unknown key-share. According to the assumption of this 
protocol that CA has verified that A possesses the private key wa corresponding 
to her static public key Ya, an adversary E can’t register A’s public key Ya 
as its own and subsequently deceive B into believing that A’s messages are 
originated from E. In addition, it does not have the duplicate-signature key 
selection (DSKS) property [^. Therefore B cannot be coerced into sharing a key 
with entity A without P’s knowledge. 

Performance 

From A’s point of view, the dominant computational steps in a run of Pro- 
tocol 4 are the scalar multiplications r^P, rAWB and {wa + rA)RB- Refer to 
an efficient scalar multiplication method using Frobenius expansions suggested 
by Cheon et am. Hence the work required by each entity is 3 (full) scalar mul- 
tiplications. Since r^P and rAWB can be computed off-line by A, the on-line 
work required by each entity is only 1 scalar multiplications, (if A and P do 
previously possess authentic copies of each other’s static public key) 

Instead of fc = kdf{c{rAWB + rBWA + rArB)P), the following variations can 
be used as the session key. 

(i) k = kdf{c{rAWB + rBWA)P\\crArBP). 

A computes k = kdf{crAWB + crcAPsllcrAPs), and P does k = 
kdf{crBWA + cwsPaIIct-bPa)- 

(ii) k = kdf{crAWBP\\crBWAP\\crArBP). 

A computes k = kdf{crAWB\\cwARB\\crARB), and P does k = 

kdf{cWBRA\\ 

crBWA\\crBRA). 

The protocol using these session keys requires 4 scalar multiplications and if 
precomputations are discounted, the total number of on-line scalar multiplica- 
tions per entity is 2. Therefore the using of these will cause some degradation in 
performance of the protocol. 

C.2 AKC Protocol 

This section describes the AKC variant of Protocol 4. Protocol 5 (AKC pro- 
tocol) is derived from Protocol 4 (AK protocol) by adding the MACs of the 
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flow number, identities, and the ephemeral public keys. Here, MACs are used 
to provide key confirmation, and Hi and H 2 are (independent) key derivation 
functions. Practical instantiations of H\ and H 2 include 7ii=SHA-l(01, z) and 
H2=SHA-1(10,z)[I1]. 

Protocol 5 

1. A generates a random integer ta, 1 < ta < n — 1, computes the point 
Ra = taP, and sends this to B. 

2. (a) B does an embedded key validation of Ra- If the validation fails, then 

B terminates the protocol run with failure. 

(b) Otherwise, B generates a random integer tb, 1 < tb ^ tl — 1 and 
computes the point Rb = tbP- 

(c) B computes K = crBWA + c{wB + rB)RA- H K = O, then B terminates 
the protocol run with failure. The shared secret is the point K . 

(d) B uses the x-coordinate z of the point K to compute two shared keys 
k = H\{z) and k' = H 2 {z). 

(e) B computes MACk' {2, IDb,IDa,Rb,Ra) and sends this together with 
Rb to A. 

3. (a) A does an embedded key validation of Rb- If the validation fails, then 

A terminates the protocol run with failure. 

(b) Otherwise, A computes K = ctaWb + c{wa + rA)RB- H K = O, then 
A terminates the protocol run with failure. 

(c) A uses the x-coordinate z of the point K to compute two shared keys 
k = H\{z) and k' = H 2 {z)- 

(d) A computes MACk' {2, IDb,IDa,Rb,Ra) and verifies that this equals 
what was sent by B. 

(e) A computes MACk'{^, IDa,IPb, Ra, Rb) and sends to B. 

4. B computes MACfe' (3, 1 Da, IDb, Ra, Rb) and verifies that this equals what 
was sent by A. 

5. The session key is k. 

Security 

AKC Protocol 5 is derived from AK Protocol 4 by adding key confirmation to 
the latter using the fact that key confirmation of k' implies that of k. This is done 
in exactly the same way AKC Protocol 2 of was derived from AK Protocol 3 of 
|5]. Protocol 2 of [5] was formally proven to be a secure AKC protocol. Heuristic 
arguments suggest that Protocol 5 has all the desirable security attributes listed 
in Section 1 m. 

Performance 

Since MACs can be computed efficiently, this method of adding key confir- 
mation to an AK protocol does not place a significant computational burden on 
the key agreement mechanism. However, the number of messages exchanged is 
increased one more|3]. 
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C.3 Two-Pass Unilateral AKC Protocol 

This section describes a new two-pass authenticated key agreement protocol 
providing unilateral key confirmation. In practice, key agreement protocol is 
that two entities agree a secret key being used for subsequently cryptographic 
communication. An entity, initiator, who wants to communicate with a second 
entity, responder, first sends his information to the responder. And then he 
receives any information and an encrypted message from the responder, who has 
computed the secret key with information of the initiator and his own to use in 
message encryption. He also computes the secret key, and assures the responder 
possesses the particular secret key through verifying the encrypted message from 
the responder. Such process is possible by only exchanging two flows. 

Protocol 6 is a two-pass key agreement protocol which provides mutual IKA 
and unilateral EKA. It also provides many desirable security attributes. In pro- 
tocol, Sa{M) denotes A’s signature over M and IDa denotes A’s identity infor- 
mation. 

Protocol 6 

1. (a) A generates a random integer ta, ^ ^ r a < n — 1, and computes the 

point Ra = taP 

(b) A computes Sa{Ra,IDa) and sends this with Ra to B. 

2. (a) B does an embedded key validation of Ra- If the validation fails, then 

B terminates the protocol run with failure. 

(b) B verifies the signature Sa{Ra, IDa) with A’s public key. If the verifi- 
cation fails, then B terminates the protocol run with failure. 

(c) Otherwise, B generates a random integer tb, 1 < rs < n — 1, and 
computes the point Rb = tbP- 

(d) B computes K = crBWA + c{wb + rB)RA- li K = O, then B terminates 
the protocol run with failure. 

(e) B uses the x-coordinate 2 : of the point K to compute two shared keys 
k = Hi{z) and k' = T-C 2 {z)- 

(f) B computes MACk' {ID b, IDa, Rb, Ra) and sends this with Rb to A. 

3. (a) A does an embedded key validation oi Rb- If the validation fails, then 

A terminates the protocol run with failure. 

(b) Otherwise, A computes K = ctaWb + c(wa + ta)Rb- If K = O, then 
A terminates the protocol run with failure. 

(c) A uses the x-coordinate z of the point K to compute two shared keys 
k = Hi{z) and k' = Ti. 2 {z)- 

(d) A computes MACk'{IDB, IDa, Rb, Ra) and verifies that this equals 
what was sent by B. 

4. The session key is k- 

Security 

Entity A sends Ra and Sa{Ra, IDa) to B, then the signature of A provides 
entity authentication (the process whereby one entity is assured of the identity 
of a second entity involved in a protocol, and that the second has actually par- 
ticipated) of A to H. The MAC sent from B to A provides entity authentication 
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of B to A and key confirmation of B to A. Subsequently A and B are on any 
communication using the shared key. Then an encrypted message sent from A 
gives an assurance to B that A actually possesses the key («.e., key confirmation 
of A to i?) during a ‘real-time’ communication. Protocol 6 also provides K-KS, 
FS, K-CI and UK-S attributes by the same reasons as Protocol 5. 
Performance 

This protocol requires additional computations of the signature. However, 
since Sa{Ra, IDa) can be precomputed by A, the computations of entity B 
for verifying the signature only are increased in the on-line work. Most of all, 
Protocol 6 has the advantage in the number of flow than Protocol 5, because a 
flow is the dominant burden in performance. 

D Comparison 

This section compares the security attributes and performance of all the proto- 
cols discussed so far. Table 1 presents the shared secret of Protocol 1 to Protocol 
6, and H’s computations for generating the shared secret. The shared secret 
of Protocol 1 is composed of tawbP and vewaP, Protocol 2 is tatbP and 
wawbP, Protocol 3 is tatbP, tawbRbP, fBWARAP and wawbRaRbP, and 
Protocols 4, 5 and 6 are tatbP, tawbP and tbwaP, respectively. 



Protocol 


Shared secret (K) 


A’s computations 


Protocol 1 (MTI/AO) 


{taWb + rBWA)P 


K = taWb + waRb 


Protocol 2 (Unified Model) 


{wAWB)P\\{rArB)P 


K = waWbWtaRb 


Protocol 3 (MQV) 


{rA + RAWA){rB+RBWB)P 


TiA = RAraoA 

sa = ta + Raw a mod n 

K = sa{Rb + RbWb) 


Protocols 4,5,6 (proposed) 


{tatb + VaWb + rBWA)P 


K = TaWb + {wa + rA)RB 



Table 1. The Shared Secret of AK protocols and A’s required computations 



Table 2 contains a summary of the services that are believed to be provided 
by the AK and AKC protocols discussed in Sections 2 and 3. The services are 
discussed in the context of an entity A who has successfully executed the key 
agreement protocol over an open network wishing to establish keying data with 
entity B. The provision of these assurances is considered that both A and B are 
honest and have always executed the protocol correctly |1]. 

In Table 3, the number of scalar multiplications required in each protocol 
is compared. Protocols 1, 2, 4 and 5 commonly require 3 scalar multiplications 
without precomputations (quantities involving the entity’s static and ephemeral 
keys and the other entity’s static keys) and require 1 scalar multiplications with 
precomputations. Protocol 3 requires 2.5 and 1.5 scalar multiplications, respec- 
tively. 
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Protocol 


IKA 


EKA 


K-KS 


FS 


K-CI 


UK-S 


Protocol 1 (AK) 


O 


X 


o 


X 


o 


o 


Three-pass MTI/AO (AKC) 


o 


o 


o 


X 


o 


o 


Protocol 2 (AK) 


o 


X 




t" 


X 


o 


Three-pass Unified Model (AKC) 


o 


o 


o 


o 


X 


o 


Protocol 3 (AK) 


o 


X 


o 


t' 


o 


X 


Three-pass MQV (AKC) 


o 


o 


o 


o 


o 


o 


Protocol 4 (proposed AK) 


o 


X 


o 




o 


o 


Protocol 5 (proposed AKC) 


o 


o 


o 


o 


o 


o 


Protocol 6 (proposed unilateral AKC) 


o 


[> 


o 


o 


o 


o 



o : the assurance is provided to A no matter whether A initiated the protocol or not. 
X : the assurance is not provided to A by the protocol. 

> : the assurance is provided to A only if A is the protocol’s initiator. 

1“ : Here the technicality hinges on the definition of what contributes another ses- 
sion key. The service of known-key security is certainly provided if the protocol is 
extended so that explicit authentication of all session keys is supplied|4]. 
t*’ : Again the technicality concerns key conhrmation. Both protocols provide forward 
secrecy if explicit authentication is supplied for all session keys. If not supplied, 
then the service of forward secrecy cannot be guaranteed [3]. 

Table 2. Security services offered by AK and AKC protocols 



Total Number 


Without precomputations 


With precomputations 


Protocol 1 


3 


1 


Protocol 2 


3 


1 


Protocol 3 


2.5 


1.5 


Protocols 4,5 


3 


1 



The number of scalar multiplications per entity of key agreement protocols 



As noted in Section 3, MACs can be computed efficiently and hence the AKC 
variants have essentially the same computational overhead as their AK counter- 
parts |4]. However, they require an extra flow which is dominated in performance. 
In this sense. Protocol 6 is considerable because it requires two flows like AK 
Protocol 4. It satisfies more security attributes than Protocol 4 and the same 
ones as Protocol 5 except merely providing EKA of B to A. However, since EKA 
of A to H can be provided in subsequently cryptographic communication, it 
doesn’t matter in security to omit the service in key agreement protocol. On 
the other hand, the computation overhead of the signature is required. As the 
signature can be precomputed, the computation for verifying the signature only 
is added to the on-line work. Consequently, Protocol 6 is said to be more efficient 
than Protocol 5 and more secure than Protocol 4. 

As shown in Table 3, Protocol 4 provides more desirable security attributes 
than other AK protocols suggested so far. If EKA is additionally provided in 
AK protocols, they possess many desirable security attributes. By the way, the 
AKC MTI/AO still does not provide FS and the AKC Unified Model does not 
K-CI, while the AKC MQV provides UK-S which the AK MQV doesn’t exhibit. 
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AKC Protocol 5 provides all security attributes in Table 2 as well as the AKC 
MQV. The AKC MQV requires 2.5 scalar multiplication per entity without pre- 
computations and 1.5 scalar multiplication with precomputations like Protocol 
3, while Protocol 5 requires 3 and 1 scalar multiplications, respectively. So, in 
the case that precomputations are discounted. Protocol 5 is more efficient than 
the MQV protocol in on-line processing. 

E Concluding Remarks 

Protocol 4 (AK protocol) has been proposed to provide the desirable security 
attributes which are not provided by the MTI/AO, two-pass Unified Model and 
two-pass MQV protocols; known-key security, forward secrecy, key-compromise 
impersonation and unknown key-share attributes. 

Protocol 5 (AKC protocol) derived from Protocol 4 provides all the security 
requirement discussed in jS]; implicit key authentication, explicit key authenti- 
cation, known- key security, forward secrecy, key-compromise impersonation and 
unknown key-share attributes. The three-pass MQV protocol (AKC protocol) 
also possesses the same security attributes as Protocol 5. However, if precom- 
putations (quantities of the entity’s static and ephemeral keys and the other 
entity’s static keys) is discounted. Protocol 5 requires less scalar multiplication 
than the MQV protocol. 

Protocol 6, the two-pass unilateral AKC protocol, has been designed as the 
other variant of Protocol 4. It provides not only mutual implicit key authen- 
tication but also unilateral explicit key authentication. Instead, it requires the 
computation overhead related to the signature and the MAC. However, since 
the signature can be precomputed and the MAC can be efficiently computed, in 
practice, the computations increased in on-line work are only quantities for ver- 
ifying the signature. So, Protocol 6 has the advantage of security than Protocol 
4 and of performance than Protocol 5. 

The proposed protocols has been shown informally by heuristic arguments 
to provide the claimed attributes of security. The following three conjectures are 
required to show that the security of the proposed protocols can be rigorously 
proved in Bellare-Rogaway modeljS] of distributed computing as done in |S]. 

(i) Protocol 4 is a secure AK protocol provided that ECDHS (Elliptic Curve 
Diffie-Hellman Scheme) is secure. 

(ii) Protocol 5 is a secure AKC protocol provided that ECDHS and MAC are 
secure, and Hi and H 2 are independent random oracles. 

(iii) Protocol 6 is a secure unilateral AKC protocol provided that ECDHS, MAC 
and digital signature scheme are secure, and Hi and H 2 are independent 
random oracles. 
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Abstract. In this paper we propose a model for anonymous traceability 
scheme with unconditional security. The model uses two subsystems: an 
authentication code with arbiter (A^-code) and an asymmetric traceabil- 
ity with unconditional security. We analyse security of the system against 
possible attacks and show that security parameters of the system can be 
described in terms of those of the two subsystems. We give an example 
construction for the model. 



Keywords: Digital fingerprint, traceability schemes, anonymity, unlinkability, 
authentication codes. 

A Introduction 

One of the main challenges in the distribution of digital products is preventing 
illegal copying and reselling. Fingerprinting allows a merchant to insert the iden- 
tity information of the buyer into the copy and so if an illegal copy is found it 
is possible to identify the malicious buyer. A fingerprint is a sequence of marks 
that is unique to a buyer. A group of buyers who have bought different copies of 
the same product containing different fingerprints, can compare their copies, find 
some of the marks and alter the marks to produce a pirate copy. Collusion secure 
fingerprinting 0, m provides protection against collusion of buyers and ensures 
that collusion of a group of buyers cannot frame another user, or construct an 
illegal copy that cannot be traced to any colluder. Boneh and Shaw showed to- 
tally c-secure codes, that is codes that allow one traitor to be identified, do not 
exist and proposed a probabilistic approach to collusion secure fingerprinting. 
Traitor tracing schemes are also studied in the context of broadcast encryption 
systems, and in unconditionally secure m and computationally secure jS] mod- 
els. In the traditional model of traitor tracing schemes the merchant is trusted 
and protection is against colluding buyers. Asymmetric tracing schemes provide 
protection against a dishonest merchant who may frame the buyer by inserting 
his fingerprint in a second copy of the object. In 0, 0 computationally secure 
anonymous tracing schemes are proposed. Anonymous schemes are asymmetric: 
that is the merchant is not assumed honest. The merchant only knows the buyer 
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through a pseudo-identity and cannot link various purchases made by the same 
buyer. 

In this paper, we propose a model for an unconditionally secure anonymous 
traceability scheme that provides security against collusion of malicious buyers 
and cheating of other participants including the merchant. The model employs 
a coin system similar to the one proposed in [7] to allow the buyer to hide his 
identity while permitting the merchant to have enough information about the 
buyer to reveal his identity with the help of a third party. The system uses 
asymmetric authentication codes and asymmetric traceability schemes as 
building blocks and its security is directly related to the security of the two 
underlying schemes. We give an example construction and prove its security. 

The paper is organized as follows. Asymmetric authentication codes and 
traceability schemes are recalled in section El The model is described in section 
o and the construction of an unconditionally secure coin system is given in 
section m An example for the scheme is shown in section El and finally, the 
paper is concluded in section E 

B Preliminaries 
B.l A^-Codes 

A^-codes are introduced by jS] and further studied in |1] and |S]. In an A^-code 
there is a sender, a receiver and an arbiter, none of them assumed trusted. The 
code allows the sender to send an authenticated message to the receiver over 
a public channel such that the chance of success of an outsider or one of the 
insiders in the attack is less than a designed value. An A^-code is defined by a 
triple {EttEr.Ea) of matrices. The matrices Et, Er and Ea are used by the 
sender, the receiver, and the arbiter, respectively. Columns of Et are labelled 
by si,S2,”’ jSfc, that form S, the set of source states. A source state s will 
be encoded to a message in the message space M = |M| > j^l, 

using a row of Et that consists of k distinct messages rrii,m2, ■ ■ ■ ,mk, and 
defines an encoding rule for the sender. We say that a pair (s, m) is valid for 
encoding rule e, if m appears in the column s of e. Columns of Er are labelled 
by mi, 7712, • • • , Each row of En consists of multiple occurrence of symbols 
of the set {si, S2, • ’ ’ ) Sk, — }, and defines a verification rule for the receiver. We 
say that a pair (s, m) is valid for a verification rule u if s appears in the column 
m in row v. Similarly, columns of Ea are labelled by mi, m2, • • • , m„. Each row 
of Ea consists of multiple occurrence of symbols from the set {si, S2, • • • , Sfc, — }, 
and defines an arbitration rule for the arbiter. We say that a pair (s, m) is valid 
for an arbitration rule a if s appears in column m of row a. 

An A^-code can protect against the following attacks. 

1. Attack Of. Observing a sequence of i legitimate pairs (si, mi), (s2, W2), 
• • • , (si, nii), the opponent places another pair (s, m) ^ (si, mi), • • • , {si,mi) 
into the channel. He is successful if both the receiver and the arbiter accept 
(s,m) as a valid pair. 
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2 . Attack Rf. Receiving a sequence of i legitimate pairs (si, wi), {s2,m2}, ■ ■ ■ , 

and using her key (a verification rule), the receiver claims that she 
has received a pair (s,m) ^ ■ ■ ■ ,{si,irii). She is successful if the 

arbiter accepts (s,m) as a valid pair under her arbitration rule. 

3 . Attack Ai'. Knowing a sequence of i legitimate pairs (si, mi), (s2, ^2), • • • , 

and using her key (an arbitration rule), the arbiter puts another 
pair (s,m) ^ (si,mi), • • • , (si,mi) into the channel. She is successful if the 
pair (s,m) is valid for the receiver. 

4 . Attack T : Using his key (an encoding rule) e, the sender sends a fraudulent 
pair (s, m) which is not valid under e. He succeeds if both the receiver and 
the arbiter accept the pair. 

Denote by Po ^ , Pr^ , Pai and Pt the probability in successful attack of Oi,Ri, 
Ai and T, respectively. To construct a secure and efficient A^-code, (i) the chance 
of success in all attacks, Pq^, Pr^, PAi and Pt must be as small as possible, and 
(ii) the size of the required key spaces \Et\, \Er\, \Ea\, and the message space 
\M\ must be as small as possible. 

B.2 Asymmetric Tracing 

Asymmetric tracing schemes with unconditional security are proposed in | 5 ]. In 
these schemes the merchant is not trusted and insertion of a fingerprint requires 
collaboration of the merchant and an arbiter. In the asymmetric tracing schemes 
proposed in the merchant only knows part of the fingerprint inserted in the 
object. This ensures that the merchant’s chance of success in framing a buyer 
will be limited. Fingerprinting is performed in two steps: the merchant inserts 
part of the fingerprint and passes the partially fingerprinted object to the ar- 
biter who then inserts the rest of the fingerprint. The fully fingerprinted object 
is given to the buyer directly without the merchant being able to access it. In the 
asymmetric tracing schemes collusion-tolerance is retained as in the symmetric 
schemes: that is whenever a certain number of buyers collude to produce a pirate 
copy, the merchant individually can trace it back to one of the traitors. If the 
buyer does not agree with the merchant’s accusation, the arbiter runs an arbi- 
tration algorithm on the whole fingerprint and accepts or rejects the merchant’s 
verdict. This ensures that an innocent buyer will never be accused and always 
one of the colluding buyers can be traced. 

C Anonymous Tracing 

There are n buyers 81,82, - ■ ■ , a merchant A 4 , a registration center TZ, a coin 
issuer (or bank) I, and an arbiter A. Suppose a buyer wants to buy a digital 
object, for example a software, from the merchant but does not want to reveal his 
identity. The merchant fingerprints the object for the buyer. To make fingerprints 
anonymous, the buyer first obtains a pseudo-identity from TZ, by visiting him and 
providing identification documents. TZ stores the real identity of the buyer and 
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his pseudo-identity in a secure database. Next the buyer presents his pseudo- 
identity to X who can verify the correctness of the pseudo-identity and issue 
coins to the buyer. A coin is verifiable by the merchant and the arbiter who 
collaboratively insert the fingerprint into the object. Coins have no monetary 
value and main purpose is to provide revocable anonymity. The coins issued to 
a buyer are unlinkable, that is they do not point to the same buyer. 

The system operation has four phases: registration, fingerprinting, tracing 
and arbitration. 

1. Registration has two subphases: pseudo-identity generation and coin with- 
drawal. Registration center generates pseudo-identity secretly, and gives it 
to the buyer. The buyer shows his pseudo-identity to the coin issuer and 
receives up to i coins. 

2. In fingerprinting phase, system works in the same way as an asymmetric 
traceability scheme. The only difference is that both the merchant and the 
arbiter, use their knowledge of the coin to check and store a coin that the 
buyer provided in lieu of his real identity. 

3. Tracing has two phases, tracing an object to a coin and revealing the identity. 
Tracing a pirate copy back to one coin is done by the merchant without 
assistance of any other principal. Revealing an identity needs cooperation of 
the merchant, the coin issuer, the arbiter and the registration center. This 
ensures that only the identity of the correctly accused buyer will be revealed. 

4. Arbitration is called when the buyer disagrees with the merchant’s accusa- 
tion. There are two possibilities: (i) Arbitration is after revealing the identity 
of the buyer in which case the arbiter directly rules between the buyer and 
the merchant; and (ii) Arbitration is before revealing the buyer’s identity. 
In this latter case, the registration center informs the buyer of the accusa- 
tion and in the case of disagreement acts on his behalf and approaches the 
arbiter. 

The process of tracing and revealing the identity will be as follows. 

Suppose the merchant finds a pirate object O. He runs his tracing algorithm 
to trace the object to one of the colluding buyers and finds the fingerprint 4> 
of a traitor. He searches his database for the fingerprint and finds which is 
the coin matching f. Then, he presents the three tuple {O, </>, C^) to the arbiter 
who will verify that (i) O includes the fingerprint </>, and (ii) that </> matches the 
coin C^. This is to ensure that a cheating merchant cannot misuse the system 
to obtain the pseudo-identity of an innocent buyer. 

Next, the merchant presents (O, 4>, C^) together with the verification infor- 
mation provided by the arbiter (alternatively the arbiter sends this information 
directly) to X and requests the buyer’s pseudo-identity. X verifies the presented 
information and uses his coin database to find the pseudo-identity. The merchant 
presents the pseudo-identity to the registration center who has stored the pair 
{ID,Inum) in a secure database during the registration phase. The two possi- 
bilities discussed in the Arbitration phase above applies to this stage. That is, 
either the registration centre revealing the identity and allowing the merchant to 
proceed with the accusation, or the registration centre approaches the buyer and 
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only if there is no objection, reveals the identity information to the merchant. If 
the buyer does not accept the accusation the registration centre presents the in- 
formation to the arbiter who will accepts, or rejects, the accusation. The buyer’s 
identity is only revealed if the accusation is accepted by the arbiter. We note 
that the difference between the two alternatives is mainly operational and does 
not affect the design of the system. 

The process of arbitration will be as follows. If the buyer does not accept the 
accusation, he presents his coin to the arbiter. Arbiter will obtain the merchant’s 
evidence cosisting of (O, 4>, C^) and uses this information to independently find 
the identity of the buyer by contacting X and TZ. If the identity matches user’s, 
then the accusation is held valid. This ensures that the merchant cannot cheat 
at the stage of either submitting pseudo-identity to TZ or after that. 

From the above description it is seen that the role of the arbiter in the system 
is twofold. Firstly to verify the information presented by the merchant before 
the buyer’s pseudo-identity is revealed, and secondly to arbitrate between the 
merchant and the buyer. 

In the following we describe a system with the above four phases using an 
unconditionally secure asymmetric authentication code and an unconditionally 
secure asymmetric fingerprinting scheme. The A^-code is used during the reg- 
istration and coin withdrawal stage. At the end of this stage, the buyer has a 
number of coins that he can use for making purchases. The coins are verifiable 
by the merchant and the arbiter and cannot be forged or tampered with without 
being caught with a high chance. 

D A Coin System 

In this section we set up a coin system using an A^-code. The system provides 
unconditional security and by choosing an appropriate A^-code the chance of 
breaking the security of the system can be made small. 

D.l Construction of Coins 

Suppose there is an A^-code {Eq, Em,Ea) where Eq denotes the sender’s code, 
Em denotes the receiver’s code and Ea denotes the arbiter’s code. The above 
three codes are public and are used by the registration center, the merchant, 
and the arbiter respectively. We also use the same notations to refer to the key 
space in each of the three codes and so for example Eq also refers to the set of 
keys used by the registration center. 

Let f,g,h be three injective functions: 

f : Eg 

g-.EM^ { 0 , 1 }”^ 
h-.EA^ { 0 , 1 }”^ 

where n\ » loglAcj, ri 2 >> log|i?M|, and na >> log|A^|. We note that / 
being injective and 2"^ >> \Eq\ implies that if / is known, the knowledge of 
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/(e) can uniquely determine e. However without the knowledge of /, choosing a 
binary string of length of ni that is the image of an element of Eq has a very 
small chance. 

We will assume that: 

Al: / is secret and is only known by TZ and J. 

A2: g is secret and is only known by Af and I. 

A3: h is secret and is only known by A and 2. 

To obtain a coin, a buyer B must follow the following steps. 

1. Identity verification: for this, the buyer needs to prove his identity IDs to 
TZ. TZ verifies that B is actually who he is claiming to be, selects an element 
e G Ea and gives /(e) to B as B's pseudo-identity. It also securely stores 
{e,IDs) for the tracing phase. 

2. B presents /(e) to X. X checks the validity of /(e) by verifying that it is in 
the image set of /. Next X selects an element v G Em and an element a G Ea 
according to the key distribution of A^-code. That is the three tuple (e, v, a) 
form a valid key set for the A^-code. 

3. X produces a coin C = {s,m, g{v),h{a)), not used before, where m corre- 
sponds to the source state s under the encoding rule e. Finally, X securely 
stores (C, /(e)). At this stage X can produce many coins using the same rule 
and each of them matches the same e. Note that the key pairs (w, a) might 
not be the same. 

Although the merchant does not use h{a), but he stores it as part of the 
whole coin (s,m, g(y), h{a)). This information could be used during the tracing 
phase. That is if he can show that the whole coin that he has stored is the same 
he has received from the buyer, then finding the pseudo-identity by X does not 
need the input from the arbiter. However arbiter’s participation at this stage is 
necessary. 

D.2 Security of Coins 

The coins are produced by X, submitted by the buyer to the merchant, who 
later submits them to the arbiter to complete the fingerprint insertion process. 
During the tracing phase, the merchant submits an alleged coin, verified by the 
arbiter, to X. 

In the following we analyse security of the coin system against coin forgery. 
That is we consider the chance of success of various participants in constructing 
a coin which is acceptable during the fingerprinting phase (by the merchant 
and the arbiter) but becomes ineffective during tracing. We do not consider any 
attack from X as he can always issue coins. We recall that the aims of the coin 
system were (t) to hide identity of an honest buyer, and (ii) to allow revealing 
the identity of a dishonest buyer who has constructed a pirate copy and this 
accusation is proved. A successful attack means one or both of the above goals 
are not achieved. 
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The following are possible attacks, including collusion attacks, on the coin 
system. However we do not allow collusion between merchant and the arbiter. 

An outsider using his best strategy to construct a coin. We note because TZ 
does not have any key information, will have the same success chance as the 
outsider. 

A buyer may submit a fraudulent coin to the merchant during purchase. 

The merchant may submit a false coin (i) to A during the fingerprinting, (ii) 
to the T during the tracing. 

An arbiter may collude with a buyer to construct a coin and make a purchase 
that is not traceable. 

We note that since TZ has no key information, his collusion with participants 
with key information (merchant, buyer and arbiter) need not be considered. 

The following two possible collusions need to be considered, {i) A4 and some 
buyers Bi, - ■ ■ ,Bi, (ii) A and some buyers B\, - ■ ■ ,Bi. Because there is no in- 
formation related to buyers identities, the knowledge of the collusion of A4 and 
Bi, - ■ ■ ,Bi is equivalent to M holding i different coins. So the success chance is 
no better than the merchant’s chance. A similar argument can be used for the 
collusion of A and B\, - ■ ■ ,Bi. 

Based on this discussion we define the following attacks. The aim of all the 
attacks is to construct a valid coin, which will be accepted by the merchant and 
the arbiter, and can not be traced. 

1. No key information of the A^-code is available to the attacker(s). 

We will use Pq to denote the best chance of success in this attack. This 
attack might be by an outsider, registration center or a collusion of them. 

2. i valid coins, (si,mi, g(vi), h(ai)), • • • , (sj, mi,g(vi), h(ai)), are observed but 
vi, - ■ • ,Vi, oi, • • • ,Ui are not known. 

We will use Pi to denote the best success chance in this attack. This attack 
might be by a single buyer, a collusion of buyers, a collusion of buyers and 
an outsider, or a collusion of buyers and the registration center. 

3. t valid coins (si,mi,g(vi),h(ai)), - ■ ■ ,(si,mi,g(vi),h(ai)) are observed, ui, 

• • • ,Vi are known but oi, • • • , Oi are not known. 

The best success chance in this attack is denoted by Pm- This attack can 
be by the merchant, or a collusion of the merchant and buyers, a collusion 
of the merchant and the registration center, a collusion of the merchant and 
an outsider. 

4. i valid coins (si,mi, g(vi), h(ai)), • • • , (sj, mi,g(vi), h(ai)) are observed. a\, 
■ ■ ■ , Oi are known but v\, - ■ ■ ,Vi are not known. 

The best success chance in this attack is denoted by Pa- This attack can be 
by the arbiter, a collusion of the arbiter and buyers, a collusion of the arbiter 
and the registration center, a collusion of the arbiter and an outsider. 

We will analyse the above four kinds of attacks. 

For Pq, attacker does not have any privileged information about a coin. How- 
ever since the A^-code is public there is a chance that he can construct a valid 
coin. The following theorem gives the best chance of success of the attacker(s). 
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Theorem 1. Pq = s,m)/2”2+"3^ where N{v,a] s,m) is the number of 

valid key pairs (v,a) in the underlying A^-code such that {s,m) is valid for both 
V and a. 

Proof. To model a coin, the attacker has to produce (i) a valid representation 
g{v) of a verification key v G Em, (**) a valid representation h{a) of an arbitration 
key a G Ea, and {Hi) a pair (s, m) which is valid for v and a. Since is a secret 
function, the chance of producing a valid string in its image space {0, 1}"^ is 
1/2"^. Since h is secret, the chance to produce a valid string in its image space 
{0, 1}”3 is l/2”3. Since Eq is public, the attacker can choose a (s,m) that is 
valid according to Eq- So his success chance is iV(r!, a; s, m)/2”^+"'*. 

For Pi, the attacker has i coins and wants to construct another valid coin. The 
following theorem gives his success chance. 

Theorem 2. Pi = max(Po., N{v, s,m)/2^^ , N{a-, s,m)/2^^), where Pq, is the 
probability of attack Oi in the underlying A^-code, N{v;s,m) is the number of 
keys V G Em in A^-code such that (s, m) is valid for v, N{a; s, m) is the number 
of keys a G Ea in A^-code such that (s, m) is valid for a. 

Proof. Attacker has three ways of attacking the system. 

One way is to keep one of the partial strings {s,m, g{v)) observed and to 
produce h{a). His success chance is N{a] s, m) /2"3 because /i is a secret function. 
Similarly, he can keep one of the observed partial strings (s, m, h{a)) and produce 
g{v). His success chance is N{v, s,m)/2"^ since g is a secret function. 

Alternatively, the attacker can keep one of the observed partial strings (g(v), 
h(a)) and produce (s, m). The attackers’ best chance is when the i coins all have 
the same strings g{v),h{a)] that is, the i coins are. 

Cl = {si,mi,g{v),h{a)) 

C 2 = {s 2 ,m 2 ,g{v),h{a)) 

Ci = {.Si,mi,g{v), h{a)) 

In this case the attacker’s best strategy is to keep g{v), h{a) and replace a pair 
{s,m) such that (i) {s,m) yf (si, mi), (s 2 , ''TI 2 ), • • • ,{si,mi), and {ii) {s,m) is 
valid for v and a. This probability is Pq, in the underlying H^-code. 

For Pm, the attacker knows the verification key and the image under g. After 
observing strings h{ai), h{a 2 ), ■ ■ • , h{ai) on the coins he has a chance to model 
a coin. The following theorem gives the probability of this attack. 

Theorem 3. Pm = max(PR. , N{a]s,m)/2^^), where Pr, is the probability of 
attack Ri in the underlying A^-code, and N{a; s, m) is the number of keys a G Ea 
in A^-code such that {s,m) is valid for a. 
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Proof. Because the attacker knows possible values of v and their images g{v), 
he has two ways of attacking the system: (i) choosing (s, m, g{v)) and guessing a 
valid h{a) or, (ii) keeping one observed h{a) and choosing {s,m, g(v)) to match 
h{a). 

After choosing {s,m,g{v)), the success probability of guessing a valid h{a) is 
N{a; s, m)l2^^. 

On the other hand, keeping an observed h{a) and choosing (s,m, g{v)) to 
match h{a), will have the best success chance if the i coins all contain the same 
partial string h{a). That is the i coins are given by. 

Cl = {si,mi,g{vi),h{a)) 

C 2 = {S2,rn2,g{v2),h{a)) 

Ci = {.‘}i,mi,g{vi), h{a)) 

In this case the attacker’s best strategy is to keep h(a) and replace {s,m,g{v)) 
part such that (i) {s,m,g{v)) ^ {si,mi,g{vi)), (s2, m2, 5(t>2)), • • • , {si,mi,g{vi)), 
and {ii) {s,m, g{v)) matches a. This probability is Pr. in A^-code. 

Finally, Pa is similar to Pm- The attacker knows the arbiter’s key and its 
image under h. After observing strings g{vi) , g{v 2 ) , ■ ■ ■ ,g(vi) on the coins he 
wants to construct a valid coin. The following theorem gives the success chance 
of this attack. The proof is similar to theorem and omitted here. 

Theorem 4. Pa = max(P^;, N(v\s,m)/2'^'^), where PAi is the success proba- 
bility of attack Ai in the underlying A^-code, N{v;s,m) is the number of keys 
V G Em in A^-code such that {s,m) is valid for v. 

From theorem m-m we can see that the probabilities of constructing a valid 
coin could be kept small by using a properly chosen A^-code and choosing large 
enough 77,2 and 773. 

D.3 The Number of Coins under One Registration 

Suppose the underlying A^-code in our coin system can protect against f-spoofing 
attacks, that is using one sender’s key, i legitimate pairs can be sent securely. 
Then each g{v) can be used on at most i coins. This can be controlled by X 
when he issues coins. By using such A^-code, every buyer can withdraw i coins 
under one registration. 

D.4 Anonymity and Unlinkability 

The main aim of the coins is to produce anonymity and unlinkability. First 
we review anonymity provided by the proposed coin system. Observing a coin 
C = (s,m,g{v),h{a)), an outsider or the registration center knows the value of 
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s and m and nothing else, and so the coin holder’s real identity is not revealed 
to neither the outsider nor the registration center. Also by seeing a coin C = 
(s, TO, g{v), h{a)), the merchant or the arbiter will not have any information that 
reveals the coin holder’s real identity. During withdrawal of a coin a buyer only 
shows his pseudo-identity /(e), and so X does not know who he actually is. This 
means by showing a coin, buyers’ real identities are not revealed to anyone. 

Next we consider the unlinkability of the coin system. During the coin issuing 
phase, X randomly chooses s,m,g{v) and h{a), and so it is possible to have 
different g{v) and h{a) on a single buyer’s coins and/or the same g{v) and h{a) 
on different buyers’ coins. So it is not possible to link the purchases that are 
made by the same buyer. 



E An Example 

We describe an anonymous traceability scheme using an A^-code proposed in 
and an asymmetric traceability scheme given in [S]. 

Coins 

Construction of the coins is based on an A^-code with multiple use [^. Let 
Fq be a field of q elements, and d be an integer such that 






( 1 ) 



An element of Fqd = Fg[x]/{p{x)) with deg {p{x)) = d can be thought of as a 
polynomial ao + aiX + a2x'^ + ■ ■ ■ + adx'^, Ui G Fq. Define a mapping (p : Fqd Fq 
as 

(p{ao + aix + -I- • • • -I- adX^) = oq. 

The registration center’s key is represented as. 



e = (ei, 62, 63, 64), 6i, 63 e Fg , 62, 64 e F ^, 
the merchant’s key is represented as, 

= (/i, /2, /a), fi G h G F ^, h G Fq , 
and the arbiter’s key is represented as, 

a = (ei, 62)7 ^ Fq , 62 G Fq . 

A triple of keys, (e, v, a), is valid if, 

63 = /i + 61/3 and 64 = /2 -I- 62/3- 
An £-tuple above can be written as 



(2) 

( 3 ) 

( 4 ) 





/6n\ 




^631 ^ 




( 




612 




632 




/l2 


ei = 


\eii) 


, 63 = 


\e3ej 


7 /l = 


l/ij 
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Let s S F^, and each message be represented as 

m = (mi,TO2,m3), m2, TO3 G F,, mi e 

Using a key e given by 0 , i pairs (si,m.i), i = 1 , • • • can be securely con- 
structed. Let 

(si,mi),(s 2 ,m 2 ),--- ( 5 ) 

be £ pairs, where 

mi = (mil, m2i, msi), m 2 = (Fii 2, m22, m32), • • • , = {fhu, m2i, mu)- (6) 

We say £ pairs are valid for an e given by (0 if 

{ fhi = Si 

m2i = eii + 4 >{sie 2 ) I <i < £ ( 7 ) 

mu = eu + <('( 3 ^ 64 ) 

A pair (s, m) is valid for a key v given by (| 2 ) if 

m3 = fii + 4>{rfiif2) + m 2 / 3 , for some i, 1 <i < £. 

A pair (s, m) is valid for a key a given by © if 

m2 = eu + 0(771162), for some i, 1 < i < £. 

For a registration key e one can withdraw £ coins in satisfying 0 . 
Fingerprints 

We use the system proposed in [H|. In 0 fingerprints correspond to blocks 
Bf = {{x,f{x)) : X G Fq} obtained from monic polynomials of degree t. Let 
xi,X2,‘ ‘ ‘ ,XsGFqhes fixed elements of Fq. A 4 embeds q—s points (x, f{x)),x G 
Fq\{xi, • • • , Xs}, as a partial fingerprint in the object and passes the object to the 
arbiter. The arbiter verifies that the merchant’s codeword is correctly embedded, 
and then secretly embeds the rest of the s points (xi, /(xi)), • • • , (x^, /(x^)). 

In our example, we will choose t as small as possible to reduce computation. 
A coin has four parts, s,m, g{v),h{a), where s G F^, m G F^+^, and g{v) and 
h{a) are strings of lengths U2 and 773, respectively, and v G F^+^ and a G F^+^. 
So a coin {s,m,g{v),h{a)) is determined by a ( 4 d + 5 )-tuple over Fq. Hence 
a mapping ^ such that (£>{C) G F^‘^+® for coin C and ^(Ci) yf ^((^2) for two 
different coins, can be used. After receiving a coin C, M uses to generate a 
monic polynomial /(x) of degree t = Ad+b. That is for d>(C) = (ai, • • • , 04^+5), 
he constructs the polynomial /(x) = oi + • • • + a4d+5X^'’*+"‘ + Since ^ is 

public, the arbiter is able to check whether the merchant’s insertion of fingerprint 
has been correct. 

There can be at most g2d+2 ^uyei-g jn tjijg system, and a buyer can have 
at most £ coins. For attackers, the success probabilities of producing a coin is 
bounded by 1 /q [H]. Fingerprints are secure against collusion of size at most, 

I -s + J s"^ + {Ad + b)q . 

JJTs J- 

Condition 0 ensures c > 2 . 
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F Conclusions 

In this paper we proposed a model for unconditionally secure anonymous trace- 
ability scheme. Using a coin system based on A^-codes and an asymmetric trace- 
ability scheme ensures that if a pirate copy is found, the merchant can always 
identify one of the colluders. It also ensures that any innocent buyer will not be 
framed by either a cheating merchant or a colluding group of buyers. Using a 
coin system guarantees anonymity of the honest buyers and unlinkability of his 
different purchases. 
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Abstract. VaudenaypQ proposed a new way of protecting block ciphers 
against classes of attacks, which was based on the notion of decorrelation. 
He also suggested two block cipher families COCONUT and PEANUT. 
Wagner[2] suggested a new differential-style attack called boomerang at- 
tack and cryptanalysed COCONUT’98. In this paper we will suggest a 
new block cipher called DONUT which is made by two pairwise perfect 
decorrelation modules. DONUT is secure against boomerang attack. 



Key words : Block cipher, Decorrelation, Differential Cryptanalysis (DC), Lin- 
ear Cryptanalysis(LC). 

A Introduction 

Vaudenay[T] proposed a new way of protecting block ciphers against differen- 
tial crvptanalvsisfDC') | 3I4| and linear cryptanalysis(LC) jSj, which was based on 
the notion of decorrelation. This notion is similar to that of universal functions 
which was introduced by Carter and Wegman|6l7|. Vaudenay also suggested 
two block cipher families COCONUT(Cipher Organized with Cute Operations 
and NUT) and PEANUT(Pretty Encryption algorithm with NUT). COCONUT 
family uses a pairwise perfect decorrelation module and PEANUT family uses a 
partial decorrelation module. 

COCONUT’98 is a product cipher C 30 C 20 C 1 , where Ci and C 3 are d-round 
Feistel ciphers and C 2 is a pairwise perfect decorrelation module. Wagner [2] 
suggested a new differential-style attack called boomerang attack and crypt- 
analysed COCONUT’98. In this paper we will suggest a new block cipher called 
DONUT(Double Operations with NUT) which is made by two pairwise perfect 
decorrelation modules. DONUT is secure against boomerang attack. 

This paper is organized as follows. In section 2, we recall the basic definitions 
used in the decorrelation theory and present the previous results of decorrelation 

* This work is supported by Korea Information Security Agency(KISA) grant 2000-S- 
078. 
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theory. In section 3, we suggest a frame structure of new cipher and show that 
this structure is secure against boomerang attack. In section 4, we describe the 
detailed structure of DONUT, and in section 5, we conclude this paper. 

B Preliminaries 

In this section, we recall the basic definitions used in the decorrelation theory 
and briefly present the previous results pi5] . 

Definition 1. Given a random function F from a given set Mi to a given set 
M2 and an integer d, we define the “d-wise distribution matrix” [F]'^ of F as a 
Mf X Mf -matrix where the {x,y)-entry of [U]'^ corresponding to the multipoints 
X = {xi,- ■ ■ ,Xd) C Mf and y = (j/i,- • • ,yd) S Mf is defined as the probability 
that we have F{xi) = yt for i = 1, - ■ ■ ,d. 

Each row of the d-wise distribution matrix corresponds to the distribution of 
the d-tuple (E(a;i),'- - ,F{xd)) where {xi,--- ,Xd) corresponds to the index of 
the row. 

Definition 2. Given two random functions F and G from a given set Mi to a 
given set M2, an integer d and a distance D over the matrix space ^ 

we define the “d-wise decorrelation D -distance between F and G” as being the 
distance 

DecFi{F,G)=D{\F]\[GY). 

We also define the “d-wise decorrelation D-bias of function F” as being the 
distance 

DecFfi{F) = D{[fY ,[F*Y) 

where F* is a uniformly distributed random function from Mi to M2 ■ Similarly, 
for Ml = M2, if C is a random permutation over Mi we define the “d-wise 
decorrelation D-bias of permutation C” as being the distance 

DecPl){C) = D{[CY,[C*Y) 

where C* is a uniformly distributed random permutation over Mi. 

In the above definition, C* is called the Perfect Cipher. If a cipher C has 
zero d-wise decorrelation bias, we call C a perfectly decorrelated cipher. When 
message space M = {0, 1}™ has a field structure, we can construct pairwise 
perfectly decorrelated ciphers on M by C{y) = A ■ y -\- B where key K = {A, B) 
is uniformly distributed on M^ x M and M^ = M — {0}. This pairwise perfect 
decorrelation module is used for COCONUT family. 

COCONUT is a family of ciphers parameterized by (p{x),m), where m is 
the block size and p{x) is an irreducible polynomial of degree m in GF( 2 )[x]. A 
COCONUT cipher is a product cipher C3 o C2 o Ci, where Ci and C3 are any 
(possibly weak) ciphers, and C2 is defined as follows: 



C2{y) = A-y + B mod p{x), 
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where A, B and y are polynomials of degree at most m — 1 in GF{2)\x\. The 
polynomials A and B are secret and act as round keys. Since the COCONUT 
family has pairwise perfect decorrelation, the ciphers are secure against the basic 
differential and basic linear cryptanalysis [T]. 

COCONUT’98 is a member of COCONUT family parameterized by (64, 

+x^ + X+1) and uses 4-round Feistel structures for C\ and C3, respectively. 
Wagner cryptanalysed COCONUT’98 using boomerang attack, which exploits 
that high probability differentials exist for both C\ and C3. 

C Frame Structure 

C.l Frame Structure of New Cipher 

Luby-Rackoffj2j showed a method for constructing a pseudorandom permu- 
tation from a pseudorandom function. This method is based on composing four 
Feistel permutations, each of which requires the evaluation of a pseudorandom 
function, i.e. Luby-Rackoff constructed a 2n-bit pseudorandom permutation from 
four n-bit pseudorandom permutations. Naor and Feingold[1 Oj achieved an im- 
provement in the computational complexity by using only two applications of 
pseudorandom functions on n-bits to compute the value of a 2n-bit pseudoran- 
dom permutation. The central idea is to sandwich the two rounds of Feistel net- 
works involving the pseudorandom functions between two pairwise independent 
2n-bit permutations. The Frame structure of new cipher is motivated by this 
result. Fig.l shows the frame structure of new cipher. In Fig. 1, A\ ■ y -\- B\ and 
A 2 -y + B 2 represent pairwise perfect decorrelation modules using A\, B\, A 2 , B 2 
as keys. 

C.2 Security of Frame Structure 

Let M = Mq be a message space of new cipher where Mq = {0, 1}^ . Suppose 
the maximum differential probability of F function is bounded by Pmax- Aoki 
and Ohta[TT] showed that the differential probability of 2-round Feistel structure 
is bounded by Pmax- Also Knudsen[12] showed that the distribution of differences 
through the decorrelation module A-y+B is very key dependent, i.e. if a 0 is an 
input difference of decorrelation module A-y+B then the output difference is aA. 
In Fig. I, since the frame structure of new cipher uses two Feistel permutations 
as inner 2-round transformation and decorrelation modules A-y + B is very key 
dependent, the maximum probability of the entire structure is Pmax- This case 
occurs when the input difference of F function is zero. For a given nonzero input 
difference of decorrelation module, the probability that the input difference of 
F function is zero is 2“^. As a point of view of attacker, he must find the 
characteristic with probability Pmax- But this occurs with probability 2“^ and 
though he can find the characteristic with probability Pmax, be must find the 
key A\ or A 2 in order to attack this cipher with computational complexity 2™, 
because he cannot know the difference of inner round functions. 
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Plaintext 




Ciphertext 



Fig. 1. Frame Structure of New Cipher 

In the following, we introduce the boomerang attack and show that DONUT 
is secure against boomerang attack. Boomerang attack was introduced by D. 
Wagner [ 2 ]. It is a differential attack that attempts to generate a quartet struc- 
ture at an intermediate value halfway through the cipher. If the best character- 
istic for half of the rounds of the cipher has probability q, then the boomerang 
attack can be used in a successful attack needing 0(g“^) chosen texts. The at- 
tacker considers four plaintexts P^P',Q,Q', along with their respective cipher- 
texts C,C',D,D'. Let E{-) represent the encryption operation, and decompose 
the cipher into E = Eio Eq, where Eq represents the first half of the cipher and 
El represents the last half. We will use two differential characteristics, A — > A* 
for Eq, as well as V — > V* for E^^. 

The attacker wants to cover the pair P, P' with the characteristic for Eq, and 
to cover the pairs P, Q and P' , Q' with the characteristic for Pj” ^ . Then the pair 
Q,Q' is perfectly set up to use the characteristic A* ^ A for Eq^ as follows: 

EoiQ) © Po(Q') = Eo{P) © Eo(P') © Eo{P) © EoiQ) © Eo(P') © Po(Q') 

= Eo{P) © Eo{P') © E^\C) © E^\D) © E^\C) © E^\D') 
= A* © V* © V* 

= A*. 

We define a right quartet as one where all four characteristics hold simultane- 
ously. The only remaining issue is how to choose the texts so they have the right 
differences. We can get this as follows. First, we generate P' = P© A and get the 
encryptions C, C of P, P' with two chosen-plaintext queries. Then we generate 
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D,D' as £> = C 0 V and D' = C (B V. Finally we decrypt D, D' to obtain the 
plaintexts Q, Q' with two adaptive chosen-ciphertext queries. 

Let Mo be the first pairwise perfect decorrelation module Ai-y + Bi and let 
Ml be the second one A 2 -y + B 2 - Let <Fo be the first 1-round Feistel permutation 
of DONUT and let be the last 1-round Feistel permutation of it. Take Eq = 
EqoMq and Ei = MioEi. Then DONUT can be represented by EioEq. Consider 
the characteristic of Eq. Eq has characteristics with probability 1 as follows: 

(c,0)^(0,c) 



where c is a nonzero value and 0 is a value. But this occurs 

only when the output difference of Mq is of the form (c, 0) and we can get 
a characteristic (a, b) (c, 0) through Mq with probability 2“ ^ because we 
do not know the key Ai of Mq. So we cannot apply the boomerang attack to 
DONUT. 



D New Block Cipher DONUT 

In this section we describe the detailed structure of the new block cipher 
called DONUT(Double Operations with NUT). 

D.l Structure 

General Structure The new block cipher DONUT transforms a 128-bit plain- 
text block into a 128-bit ciphertext block. DONUT uses variable key length and 
consists of 4 rounds. The first round and the fourth round consist of pairwise 
perfect decorrelation modules Ai ■ y + Bi and A 2 ■ y + B 2 where Ai, i?i, 2 I 2 , i ?2 
are 128-bit subkeys. The inner 2-round transformation consists of two Feistel 
permutations and each round uses six 32-bit subkeys(see Fig. 1). 



The Round Function F The round function F of DONUT consists of three 
G functions. A 64-bit input of F is split into two 32-bit words and 3-round 
transformation is followed with inner function G. Fig. 2 shows the structure of 
F function. 



The inner function G The function G is a key-dependent permutation on 
32-bit words with two 32-bit subkeys. The function G which we call the SDS 
function consists of 5 layers as follows: 

1. The first key addition layer. 

2. The first substitution layer. 

3. The diffusion layer. 

4. The second key addition layer. 

5. The second substitution layer. 

Fig. 3 shows the structure of G function. 
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Fig. 2. The structure of F function 




Y = (t/l,2/2,J/3,t/4) 



Fig. 3. The structure of G function 
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The Substitution Layer In the substitution layer we use the same S-box which 
is 8-bit input/output permutation. The S-box is constructed by the function of 
the form a-x~^(Bb, where a = 0xo5, b = 0x37 € GF(2®). The Galois field GF(2®) 
is defined by the irreducible polynomial + 1 (hex : 0x1 Id) . In the 

GF(2®), the x~^ has a good resistance against differential and linear attacks. 
The purpose of using affine transform is preventing from two fixed points such 
as zero to zero and one to one in the function x~^. 



The Diffusion Layer The diffusion layer is performed by the 4x4 circulant- 
matrix D. 

/01 06 07 02\ 

06 07 02 01 

07 02 01 06 
\02 01 06 07 / 

Let X = (x3, X2, xi, xo) = input of the diffusion layer and 

^ = (d3; 2/2, 2/1, do) = X)fco2/*2®* be the output of the diffusion layer. Then we 
have the followings: 



( 




/01 06 07 02 \ 




/ Xo\ 




/ 01 • xq 0 06 • xi 0 07 • X2 0 02 • X3 \ 


2/1 




06 07 02 01 




Xi 




06 • xq 0 07 • xi 0 02 • CC2 0 01 • xs 


2/2 




07 02 01 06 




X2 




07 • xq 0 02 • xi 0 01 • X2 0 06 • X3 


yys) 




i^02 01 06 07 yi 




\X3j 




^ 02 • xq 0 01 • xi 0 06 • CC2 0 07 • X3 J 



The Key Scheduling DONUT has ffexible key length. Since two decorrelation 
modules need four 128-bit subkeys and every G function needs two 32-bit(4 
bytes) subkeys, we need to generate 28 32-bit subkeys. Our design strategy of 
key schedule is to avoid finding some round keys from another round keys. 

In the following let G(X, Y, Z) be a G-function with input X, the first addi- 
tion key Y, and the second addition key Z. The notation << n means n-bit left 
shift and <<< n(>>> n) means n-bit left(right) rotation. Let b{> 16) be the 
byte number of key length and uk[0],uk[l], ,uk[b — 1] he a, user-supplied key. 
Then the followings are the key schedule of DONUT: 

— Input : uk[0],uk[l], ■ ■ • , uk[b — 1]. 

— Output : fc[0], /c[l] • • • ,k[27]. 

1. for( i = 0;i < 112; i -|- -I-) L[i] = uk[i%b]; 

2. for( i = 5; i < 109; i + +) L[i] = 0 S[L[i — 5]] 0 S[L[i 0 3]]; 

3. X[0] = 0x9e377969; 

4. r[0] = 0x&7el5163; 

5. for( i = 0; i < 28; i 0 0) do the followings: 

(a) T[i] = L[U] I (L[4i0l] « 8) | {L[U + 2] « 16) | {L[U + 3] « 24); 

(b) X[* 0 1] = G{X[i], (T[*j »> 7), (T[*j «< 5)); 

(c) Y[z 0 1] = G(F[*], (TW «< 13), (T[*] »> 9)); 

(d) K\i] = X[i 0 1] 0 Y[i 0 1] 0 T[i\- 
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D.2 Efficient Implementation 

For the efficient implementation we implement the function G with the T-table. 
The T-table(T[4][256]) stores the multiplication of T[0][i] = 01 • >S'[i](= S[i]), 
T[l][i\ = 06 • 5'[z], T[2]\i] = 07 • S'[i] and T[3][i] = 07 • ^[i], where 0 < i < 256. 
Then the first substitution layer and the diffusion layer are calculated at the 
same time as the followings: 



( 




( ‘S'N] \ 




/01 06 07 02\ 


/ S[xo] \ 


2/1 


= D 


S'[xi] 




06 07 02 01 


S'[xi] 


2/2 


*S'[X2] 




07 02 01 06 


5'[x2] 


Kys) 




\5[x3]y^ 




1^02 01 06 07 / 


\5'[x3]/ 



/ 01 • 5'[xo] 0 06 • © 07 • S'[x2] © 02 • S[x3] \ 

06 • 5'[xo] © 07 • S'[xi] © 02 • S[x2] © 01 • S[x3\ 

07 • S[xo] © 02 • ^icri] © 01 • S[x 2 ] © 06 • S[x 3 ] 

\ 02 • S[xo] © 01 • ^Icri] © 06 • S[x 2 ] © 07 • ^'[cca] / 

/T[0][xo] © T[l][xi] © T[2][x2] © T[3][x3] \ 

T[l] [xo] © T[2] [xi] © T[3] [X2] © T[0] [xg] 

T[2][xo] © T[3][xi] © T[0 ][x 2] © T[l][xa] 

VT[3][xo] © T[0][xi] © T[1][x2] © T[2][x3] 

Since T[0] is same as the S-box, we only need 1024 bytes memory to implement 
the function G. 



E Conclusion 

In this paper we suggested a new block cipher DONUT (Double Operations 

with NUT). DONUT is made by two pairwise perfect decorrelation modules in 

order to avoid boomerang attack. We also showed that DONUT is secure against 

boomerang attack. 
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Abstract. We show how to efficiently generate RSA keys on a low power 
handheld device with the help of an untrusted server. Most of the key 
generation work is offloaded onto the server. However, the server learns 
no information about the key it helped generate. We experiment with 
our techniques and show they result in up to a factor of 5 improvement 
in key generation time. The resulting RSA key looks like an RSA key for 
paranoids. It can be used for encryption and key exchange, but cannot 
be used for signatures. 



A Introduction 

In recent years we have seen an explosion in the number of applications for 
handheld devices. Many of these applications require the ability to communi- 
cate securely with a remote device over an authenticated channel. Example ap- 
plications include: (1) a wireless purchase using a cell phone, (2) remote secure 
synchronization with a PDA, (3) using a handheld device as an authentication 
token [2j, and (4) handheld electronic wallets [3|. Many of these handheld appli- 
cations require the ability to issue digital signatures on behalf of their users. 

Currently, the RSA cryptosystem is the most widely used cryptosystem for 
key exchange and digital signatures: SSL commonly uses RSA-based key ex- 
change, most PKI products use RSA certificates, etc. Unfortunately, RSA on a 
low power handheld device is somewhat problematic. For example, generating a 
1024 bit RSA signature on the PalmPilot takes approximately 30 seconds. Nev- 
ertheless, since RSA is so commonly used on servers and desktops it is desirable 
to improve its performance on handhelds. 

In this paper we consider the problem of generating RSA keys. Generating 
a 1024 bit RSA key on the PalmPilot can take as long as 15 minutes. The 
device locks up while generating the key and is inaccessible to the user. For 
wireless devices battery life time is a concern. Consider a user who is given a 
new cellphone application while traveling. The application may need to generate 
a key before it can function. Generating the key while the user is traveling will 
lock up the cellphone for some time and may completely drain the batteries. 

The obvious solution is to allow the handheld to communicate with a desktop 
or server and have the server generate the key. The key can then be downloaded 
onto the handheld. The problem with this approach is that the server learns 
the user’s private key. Consequently, the server must be trusted by the user. 
This approach limits mobility of the handheld application since users can only 
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generate a key while communicating with their home domain. We would like to 
enable users to quickly generate an RSA key even when they cannot communicate 
with a trusted machine. 

We study the following question: can we speed up RSA key generation on 
a handheld with the help of an untrusted server! Our goal is to offload most 
of the key generation work onto the untrusted server. However, once the key 
is generated the server should have no information about the key it helped 
generate. This way the handheld can take advantage of the server’s processing 
power without compromising the security of its keys. 

Our best results show how to speed up the generation of unbalanced RSA 
keys. We describe these keys and explain how they are used in the next section. 
Our results come in two flavors. First, we show how to speed up key generation 
with the help of two untrusted servers. We assume that the two servers are 
unable to share information with each other. For instance, the two untrusted 
servers may be operated by different organizations. Using two untrusted servers 
we are able to speed up key generation by a factor of 5. We then show that a 
single untrusted server can be used to speed up key generation by a factor of 2. 
In Section O we discuss speeding up normal RSA key generation (as opposed to 
unbalanced keys). 

We implemented and experimented with all our algorithms. We used the 
PalmPilot as an example handheld device since it is easy to program. Clearly our 
techniques apply to any low power handheld: pagers, cell phones, MP3 players, 
PDA’s, etc. In our implementation, the PalmPilot connects to a desktop machine 
using the serial port. When a single server is used to help generate the key, the 
pilot communicates with the desktop using TCP/IP over the serial link. The 
desktop functions as the helping server. Note that there is no need to protect the 
serial connection. After all, since the desktop learns no information about the key 
it helped generate, an attacker snooping the connection will also learn nothing. 
When two servers are used, the desktop functions as a gateway enabling the pilot 
to communicate with the two servers. In this case, communication between the 
pilot and servers is protected by SSL to prevent eavesdropping by the gateway 
machine, and to prevent one server from listening in on communication intended 
for the other. Typically, the gateway machine functions as one of the two servers, 
as shown in Figure [H 




Fig. 1. A two server configuration 
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A.l Timing Cryptographic Primitives on the PalmPilot 

For completeness we list some running times for cryptographic operations on the 
PalmPilot. We used the Palm V which uses a 16.6MhZ Dragonball processor. 
Running times for DES, SHA-1, and RSA were obtained using a port of parts 
of SSLeay to the PalmPilot started by Ian Goldberg. 



Algorithm 


Time 


Comment 


DES encryption 
SHA-1 


4.9ms/block 

2.7ms/block 




1024 bit RSA key generation 
1024 bit RSA sig. generation 


15 minutes on average 
27.8 sec. 




1024 bit RSA sig. verify 


0.758 sec. 


e = 3 


1024 bit RSA sig. verify 


1.860 sec. 


e = 65537 



B Preliminaries 

B.l Overview of RSA Key Generation 

As a necessary background we give a brief overview of RSA key generation. 
Recall that an RSA key is made up of an n-bit modulus N = pq and a pair 
of integers d, called the private exponent, and e, called the public exponent. 
Typically, N is the product of two large primes, each n/2 bits long. Throughout 
the paper we focus on generating a 1024 bit key (i.e. n = 1024). The algorithm 
to generate an RSA key is as follows: 

Step 1: Repeat the following steps until two primes p, q are found: 

a. Candidate Pick a random 512 bit candidate value p. 

b. Sieve Using trial division, check that p is not divisible by any small 
primes (i.e. 2, 3, 5, 7, etc.). 

c. Test Run a probabilistic primality test on the candidate. For simplicity 

one can view the test as checking that = j-x (mod p), where g 

is a random value in 1 . . .p — 1. All primes will pass this test, while a 
composite will fail with overwhelming probability uni. 

Step 2: Compute the product N = pq (the product is 1024 bits long). 

Step 3: Pick encryption and decryption exponents e and d where e - d = 1 mod 
(p{N) and p(A) = N — p — q + 1. 

The bulk of the key generation work takes place in step (1). Once the two 
primes p and q are found, steps (2) and (3) take negligible work. We note that 
trial division (step lb) is frequently optimized by using a sieving algorithm. 
Sieving works as follows: once the candidate p is chosen in step (la), the sieve 
is used to quickly find the closest integer to p that is not divisible by any small 
primes. The candidate p is then updated to be the integer found by the sieve. 
Throughout the paper we use a sieving algorithm attributed to Phil Zimmerman. 

Our goal is to improve the performance of step (1). Within step (1), the 
exponentiation in step (Ic) dominates the running time. Our goal is to offload 
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the primality test to the server without exposing any information about the 
candidate being tested. Hence, the question is: how can we test that mod 
p = 1 with the help of a server without revealing any information about p? 
To do so we must show how to carry out the exponentiation while solving two 
problems: (1) hiding the modulus p, and (2) hiding the exponent p — 1. 

B.2 Unbalanced RSA Keys 

Our best results show how to speed up the generation of unbalanced RSA keys. 
An unbalanced key uses a modulus N of the form N = p ■ R where p is a 512 
bit prime and i? is a 4096 bit random number. One can show that with high 
probability R has a prime factor that is at least 512 bits long (the probability 
that it does not have such a factor is less than 1/2^^). Consequently, the resulting 
modulus N is as hard to factor as a standard modulus N = pq. 

An unbalanced key is used in the same way as standard RSA keys. The public 
key is (e, N) and the private key is {d, N). We require that e ■ d = 1 mod p — 1. 
Suppose p is m-bits long. The system can be used to encrypt messages shorter 
than TO bits. As in standard RSA, to encrypt a message M, whose length is much 
shorter than to bits, the sender first applies a randomized padding mechanism, 
such as OAEP gn. The padding mechanism results in an to — 1 bit integer P 
(note that P < p). The sender then constructs the ciphertext by computing C = 
P® mod N. Note that the ciphertext is as big as N. To decrypt a ciphertext C, 
the receiver first computes Cp = C mod p and then recovers P by computing P = 
Cp mod p. The plaintext M is then easily extracted from P. Since decryption is 
done modulo p it is as fast as standard RSA. 

The technique described above for using an unbalanced key is similar to 
Shamir’s “RSA for paranoids” m . It shows that unbalanced keys can be used for 
encryption/decryption and key exchange. Unfortunately, unbalanced keys cannot 
be used for digital signatures. We note that some attacks against RSA for para- 
noids have been recently proposed [S] . However, these attacks do not apply when 
one uses proper padding prior to encryption. In particular, when OAEP padding 
is used [3] the attacks cannot succeed since the security of OAEP (in the random 
oracle model) only relies on the fact that the function / : {0, . . . ,2”’'“^} Ijn 
defined by f{x) = mod A is a one-to-one trapdoor one way function. 

C Generating an Unbalanced RSA Key with the Help of 
Untrusted Servers 

We show how RSA key generation can be significantly sped up by allowing the 
PalmPilot to interact with untrusted servers. At the end of the computation the 
servers should know nothing about the key they helped generate. We begin by 
showing how two untrusted servers can help the Pilot generate RSA keys. The 
assumption is that these two servers cannot exchange information with each 
other. To ensure that an attacker cannot eavesdrop on the network and obtain 
the information being sent to both servers, our full implementation protects the 
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connection between the Pilot and the servers using SSL. Typically, the machine 
to which the pilot is connected can be used as one of the untrusted servers 
(Figure [T]). We then show how to speed up key generation with the help of a 
single server. In this case there is no need to protect the connection. 

C.l Generating Keys with the Help of Two Servers 

Our goal is to generate a modulus of the form N = pR where p is a 512-bit 
prime and i? is a 4096-bit random number. To offload the primality test onto 
the servers we must hide the modulus p and the exponent p — 1. To hide the 
modulus p we intend to multiply it by a random number R and send the resulting 
N = pR to the servers. The server will perform computations modulo N = pR. 
If it turns out that p is prime, then sending N to the servers does not expose any 
information about p or R. If p is not prime we start over. To hide the exponent 
p — 1 used in the primality test we intend to share it among the two servers. 
Individually, neither one of the servers will learn any information. 

Our algorithm for generating an unbalanced RSA modulus N = pR is as 
follows. The algorithm repeats the following steps until an unbalanced key is 
generated: 

Step 1: Pilot generates a 512 bit candidate p that is not divisible by small 
primes and a 4096 bit random number R. We require that p = 3 mod 4. 
Step 2: Pilot computes N = p ■ R. 

Step 3: Pilot picks random integers si and S 2 in the range [— p,p] such that 
Si + S 2 = (p — l)/2. It also picks a random g G 
Step 4: Pilot sends {N,g,si) to server 1 and {N,g,—S 2 ) to server 2. 

Step 5: Server 1 computes Xi = g^^ mod N. Server 2 computes X 2 = 
^(-« 2 ) niod N. Both results Xi and X 2 are sent back to the pilot. 

Step 6: Pilot checks whether Xi = ±X 2 mod p. If equality holds, then N = pR 
is declared as a potential unbalanced RSA modulus. Otherwise, the algo- 
rithm is restarted in Step 1. 

Step 7: The Pilot locally runs a probabilistic primality test to verify that p is 
prime. This is done to ensure that the servers returned correct values. This 
step adds little time to the entire key generation process. 

First, we verify the soundness of the algorithm. In step 6 the Pilot verifies 
that < 7*1 • g^^ = = ±1 mod p. If the test is satisfied then p is very likely 

to be prime. Then step 7 ensures that p is in fact prime and that the servers did 
not respond incorrectly. When generating a 1024 bit RSA key, a single primality 
test takes little time compared to the search for a 512 bit prime. Hence, Step 7 
adds very little to the total running time. 

During the search for the prime p, the only computation carried out by the 
pilot is the probable prime generation and the computation of si and S 2 - The 
time to construct si and S 2 is negligible. On the other hand, generating the 
probable prime p requires a sieve to ensure that p is not divisible by small 
factors. As we shall see in Section |E] the sieve is the bottleneck. This is unusual 
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since in standard RSA modulus generation sieving takes only a small fraction of 
the entire computation. We use a sieving method attributed to Phil Zimmerman. 
We note that faster sieves exist, but they result in an insecurity of our algorithm. 

Security To analyze the security properties of the algorithm we must argue that 
the untrusted servers learn no information of value to them. During the search 
for the RSA modulus many candidates are generated. Since these candidates 
are independent of each other, any information the servers learn about rejected 
candidates does not help them in attacking the final chosen RSA modulus. Once 
the final modulus N = pR is generated in Step 2, each server is sent the value of 
N and Si where i is either 1 or 2. The modulus N will become public anyhow (it 
is part of the public key) and hence reveals no new information. Now, assuming 
servers 1 and 2 cannot communicate, the value si is simply a random number 
(from Server 1 ’s point of view) . Server 1 could have just as easily picked a random 
number in the range [— iV, iV] itself. Hence, si reveals no new information to 
Server 1 (formally, a simulation argument shows that si reveals at most two 
bits). The same holds for Server 2. Hence, as long as Server 1 and Server 2 
cannot communicate, no useful information is revealed about the factorization 
of N. We note that if the servers are able to communicate, they can factor N. 

Performance The number of iterations until a modulus is found is identical 
to local generation of an (unbalanced) modulus on the PalmPilot. However, 
each iteration is much faster than the classic RSA key generation approach 
of Section IH.1 1 After all, we offloaded the expensive exponentiation to a fast 
Pentium machine. As we shall see in Section EH the total running time is 
reduced by a factor of 5. 

C.2 Generating Keys with the Help of a Single Server 

Next, we show how a single untrusted server can be used to reduce the time to 
generate an RSA key on the PalmPilot. Once the key is generated, the server 
has no information regarding the key it helped generate. Typically, the pilot 
connects to the helping server directly through the serial or infrared ports. 

As before we need to compute j^od p to test whether p is prime. Our 

technique involves reducing the size of the exponent using the help of the server 
and hence speeding up exponentiation on the pilot. The algorithm repeats the 
following steps until an unbalanced modulus is found: 

Step 1: Pilot generates a 512 bit candidate p that is not divisible by small 
primes and a 4096 bit random number R. We require that p = 3 mod 4. 
Step 2: Pilot computes N = p ■ R. It picks a random g G 1^%. 

Step 3: Pilot picks a random 160 bit integer r and a random 512 bit integer a. 

It computes z = r + a{p — l)/2. 

Step 4: Pilot sends (N,g,z) to the server. 

Step 5: The server computes X = g^ mod N and sends X back to the Pilot. 
Step 6: Pilot computes Y = g^ mod p. 
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Step 7: Pilot checks if X = ±F mod p. If so then the algorithm is finished and 
N = pR is declared as a potential unbalanced RSA modulus. Otherwise, the 
algorithm is restarted in Step 1. 

Step 8: The Pilot locally runs a probabilistic primality test to verify that p is 
prime. 

To verify soundness observe that N will make it to step 8 only if X = ±Y 
i.e. mod p. As before, this condition is always satisfied if p is 

prime. The test will fail with overwhelming probability if p is not prime. Hence, 
once step 8 is reached the modulus N = pR is very likely to be an unbalanced 
modulus. The test is Step 8 takes little time compared to the entire search for 
the 512-bit prime. 

Performance Since we are generating an unbalanced modulus the number of 
iterations until N is found is the same as in local generation of such a modulus 
on the PalmPilot. Within each iteration the Pilot generates p and R using a sieve 
and then computes Y = g'’ mod p (in step 6). However, r is only 160 bits long. 
This is much shorter than when a key is generated without the help of a server. 
In that case the Pilot has to compute an exponentiation where the exponent 
is 512 bits long. Hence, we reduce the exponentiation time by approximately a 
factor of three. Total key generation time is reduced by a factor of 2, due to the 
overhead of sieving. 

Recall that in Step 6 the Pilot computes Y = mod p where r is a 160-bit 
integer. This step can be further sped up with the help of the server. Let A = 2^° 
and write r = tq + r\A + r 2 A'^ -|- r^A^ where rg, ri, r 2 , rg are all in the range 
[0, A]. In Step 5 the server could send back the vector R = {g"^, 9^ ,9^ ) rnod N 
in addition to sending X. Let R = {Ri, R 2 , Rs). Then in Step 6 the Pilot only 
has to compute Y = g^° ■ ■ R^^ ■ R^^ mod p. Using Simultaneous Multiple 

Exponentiation [71 p. 617] Step 6 can now be done in approximately half the time 
of computing Y = g'~ mod p on the Pilot directly. This improvement reduces the 
total exponentiation time on the Pilot by an additional factor of 2 . 

Security In the last iteration, when the final p and R are chosen, the server 
learns the value z = a{p — 1) -I- r. Although z is a “random looking” 1024 bit 
number, it does contain some information about p. In particular, z mod p — 1 is 
very small (only 160 bits long). The question is whether z helps an adversary 
break the resulting key. The best known algorithm for doing so requires 2”/^ 
modular exponentiations. Due to our choice of 160 bits for r, the algorithm has 
security of approximately 2®°. This is good enough since a 1024 bits RSA key 
offers security of 2®° anyhow. Nevertheless, the security of the scheme is heuristic 
since it depends on the assumption that no faster algorithm exists for factoring 
N given z. More precisely, the scheme depends on the following “(p— 1 (-multiple 
assumption” : 

(p — 1) -multiple assumption: Let A„ be the set of integers N = pq where p 
and q are both n-bit primes. Let m be an integer so that the fastest algorithm 
for factoring a random element N G A„ runs in time at least 2"*/^. Then the 
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two distributions: {N,r + a{p — l)/2) and (N,x) cannot be distinguished with 
non-negligible probability by an algorithm whose running time is less than 2™/^. 
Here N is randomly chosen in An, a is randomly chosen in [0,p], r is randomly 
chosen in [0,2™], and x is randomly chosen in [0,p^/2j. 

Based on the {p — l)-multiple assumption, the integer z given to the server 
contains no more statistical information than a random number in the range 
[0,p^j. Hence, the server learns no new useful information from z. 

As before, since the generated key is an unbalanced key it can only be used 
for encryption/decryption and key exchange. It cannot be used for signatures. 



D Generating Standard RSA Keys 

One could wonder whether the techniques described in the previous sections can 
be used to speed up generation of standard RSA keys. We show that at the 
moment these techniques do not appear to improve the generation time for a 
1024 bit key. For shorter keys, e.g. 512 bits keys, we get a small improvement. 
In what follows we show how to generate a normal RSA key, N = pq, with the 
help of two servers. 

We wish to generate an RSA modulus N = pq where p and q are each 512- 
bits long. As before, we wish to offload the primality test to the servers. To do 
so we must hide the moduli p and q and the exponents p — 1 and q — 1 . The 
basic idea is to simultaneously test primality of both p and q. For each pair of 
candidates p and q the Pilot computes N = pq and sends N to the servers. The 
servers carry out the exponentiations modulo N. To hide the exponents p — 1 
and q — 1 we share them among the two servers as in the last section. 

The resulting algorithm is similar to that for generating unbalanced keys. 
In fact, the server-side is identical. The algorithm works as follows. Repeat the 
following steps until a standard RSA modulus is found: 

Step 1 : Pilot generates two candidates p, q so that neither one is divisible by 
small primes. We refer to p and q as probable primes. 

Step 2: Pilot computes N = p ■ q and if{N) = N — p — q + 1. Pilot picks a 
random g £ 7 j*^. 

Step 3: Pilot picks random integers pi and p 2 in the range [-N, iV] such that 
Pi + P2 = p(A^)/4. 

Step 4: Pilot sends {N,g,(fi) to server 1 and {N,g,—(p 2 ) to server 2. 

Step 5: Server 1 computes Xi = g‘^^ (mod N). Server 2 computers X 2 = g~‘^^ 
(mod N). Both results Xi and X 2 are sent back to the pilot. 

Step 6: Pilot checks if Xi = ±X 2 mod N. If so, the algorithm is finished and 
N = pq is declared as a potential RSA modulus. Otherwise, the algorithm 
is restarted in Step 1. 

Step 7: The Pilot locally runs a probabilistic primality test to verify that p and 
q are prime. This is done to ensure that the servers returned correct values. 

First, we verify soundness of the algorithm. In step 6 the Pilot is testing that 
Xi = ±A 2 , namely that g‘^^ = g~‘^^ mod N. That is, we check that gVi+V 2 _ 
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gV(N)/A _ mod N . Clearly, this condition holds if p and q are both primes. 

Furthermore, it will fail with overwhelming probability if either p or q are not 
prime. Hence, Step 7 is reached only \i N = pq is extremely likely to be a proper 
RSA modulus. Step 7 then locally ensures that p and q are primes. 

Security To analyze the security properties of the algorithm we must argue that 
the untrusted servers learn no information of value to them. During the search 
for the RSA modulus many candidates are generated. Since these candidates 
are independent of each other, any information the servers learn about rejected 
candidates does not help them in attacking the final chosen RSA modulus. Once 
the final modulus N = pq is generated in Step 2, each server is sent the value of 
N and pi where i is either 1 or 2. The modulus N will become public anyhow (it 
is part of the public key) and hence reveals no new information. Now, assuming 
servers 1 and 2 cannot communicate, the value pi is simply a random number 
(from Server I’s point of view). Server 1 could have just as easily picked a 
random number in the range [— iV, N] itself. Hence, ipi reveals no new information 
to Server 1. As long as Server 1 and Server 2 cannot communicate, no useful 
information is revealed about the factorization of N. If the servers are able to 
communicate, they can factor N. 

Performance Each iteration in our algorithm is much faster than the classic 
RSA key generation approach of Section IH.1 1 — we offloaded the expensive expo- 
nentiation to a fast Pentium machine. Unfortunately, the number of iterations 
required until an RSA modulus is found is higher. More precisely, suppose in 
the classic approach one requires k iterations on average until a 512-bit prime is 
found. Then the total number of iterations to find two primes is 2k on average. 
In contrast, in our approach both p and q must be simultaneously prime. Hence, 
k^ iterations are required. We refer to this effect as a quadratic slowdown. When 
generating a 1024 bit modulus the value of k is approximately 14. So even though 
we are able to speed up each iteration by a factor of 5, there are seven times as 
many iterations on average. Therefore when generating a standard 1024 bit key 
these techniques do not improve the running time. When generating a shorter 
key, e.g. a 512 bit key, the quadratic slowdown penalty is less severe since k is 
smaller (9 rather than 14). For such short keys we obtain a small improvement 
in performance. 

Similarly, when generating keys with the help of a single server, the quadratic 
slowdown outweighs the reduction in time per iteration. It is an open problem 
to speed up server aided generation of standard RSA keys. 

E Experiments and Implementation Details 

The two main components of our implementation were the cryptographic and 
networking modules. SSLeay provided for the cryptographic code on both the 
server (Pentium II 400Mhz) and PalmPilot side. In the case of the Pilot, we used 
SSLeay code that had been previously ported by Ian Goldberg. 
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Networking We connect the pilot to a Windows NT gateway running RAS 
(Remote Access Service) through a serial-to-serial interface. The function of the 
gateway was to provide TCP/IP access to the network. In our single server 
implementation, we used the gateway as our assisting server while in our dual 
server implementation, we used the gateway and another local machine. 

E.l Experiments 

Tables H] and El show the results we obtained when generating 512, 768 and 
1024 bit RSA keys. The network traffic column measures the amount of data 
(in bytes) exchanged between the Pilot and the servers. We generate keys using 
three methods: 

(1) Local: Key generated locally on the Pilot (no interaction with a server). 

(2) One server: Pilot aided by a single untrusted server. 

(3) Two servers: Pilot aided by two untrusted servers. 

As expected we see that generating unbalanced keys with the aid of one or 
two servers leads to a performance improvement over generating keys locally on 
the PalmPilot. The rest of this section discusses these experimental results. 

We note that the key factor that determines the time it takes to generate 
an RSA key is the time per iteration (the time to sieve and exponentiate one 
probable prime p) . This number is more meaningful than the total running since 
since the total time has a high variance. More precisely, the number of iterations 
until a prime is found has high variance. Our tables state the average number of 
iterations we obtained. 

In our experiments, we carried out trial division on a candidate prime using 
the first 2048 primes (upto approximately 17,000). In all our experiments we ob- 
served that the server’s responses are instantaneous compared to the Pilot’s pro- 
cessing time. Consequently, improving server performance will only marginally 
affect the overall running time. 



Generating a 1024 bit key Table [U shows detailed timing measurements for 
generating 1024 bit RSA keys. Our breakdown of timing measurements follows 
the description in Section The first column shows the time to pick a prob- 
able prime, the second shows the time the Pilot spent waiting for the server to 
respond, the third shows the time to exponentiate on the PalmPilot (not used in 
the two-server mode). The last column shows the total network traffic (in bytes). 

The first two rows in Table [T] measure the time to generate keys on the Pilot. 
The first column represents the time to generate an unbalanced key, the second 
represents the time to generate a normal N = pq key. Since an unbalanced key 
requires only one prime (the other is a random number) the number of iterations 
for locally generating an unbalanced key is half that for generating a normal key. 

When comparing the time per iteration for local generation and two server 
generation, we see that using two servers we get an improvement of a factor 
of 5. Using one server we obtain an improvement of a factor of 2. The average 
number of iterations is approximately the same in all three methods. Note that 
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the improvements are a result of speeding up (or eliminating) the exponentiation 
step on the PalmPilot. Observe that when two servers are used the bottleneck 
is the sieving time — the time to generate a probable prime p. 

On average, 406 iterations are needed to generate a normal RSA key {N = pq) 
with the aid of two servers. The large number of iterations is a result of the 
quadratic slowdown discussed in Section |D] Even though each iteration is much 
faster than the corresponding value for local generation, we end up hurting the 
total generation time. 

Our algorithms require only a few kilobytes of data transfer between the Pilot 
and the servers. The traffic generated is linear in the number of iterations which 
explains the large figure for two server normal key generation. 
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Table 1. Statistics for different key generation methods (1024 bit keys) 



Generating various key sizes From Table [2] we see that the total iteration 
time increases almost linearly with key size for dual server aided generation. 
Indeed, the dominant component of each iteration is sieving, which takes linear 
time as a function of the key size. The expected total time for generating the key 
is the product of the time-per-iteration and the expected- number-of-iterations. 

Observe that the improvement over local generation is less significant for 
shorter keys than for longer keys. The reason is that for smaller keys, the pri- 
mality test is less of a dominating factor in the running time per iteration (we 
use the same size sieve for all key sizes). Hence, reducing the exponentiation 
time has less of an effect on the the total time per iteration. 
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Table 2. Statistics for different key sizes 



F Conclusions 

At the present using RSA on a low power handheld is problematic. In this paper 
we study whether RSA’s performance can be improved without a loss of security. 
In particular, we ask whether an untrusted server can aid in RSA key generation. 
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We wish to offload most of the work to the server without leaking any of the 
handheld’s secrets. 

We showed a significant improvement in the time it takes to generate an 
unbalanced RSA key. With the help of two isolated servers we obtained a speed 
up of a factor of 5. With the help of a single server we obtained a speed up of a 
factor of 2. For normal RSA keys, N = pq, we cannot improve the running time 
due do the quadratic slowdown problem discussed in Section El It is an open 
problem to speed up the generation of a normal RSA key using a single server. 
In all our algorithms the load on the server is minimal; our experiments show 
that even though the server is doing most of the work, the PalmPilot does not 
produce candidates fast enough to fully occupy the server. 

We thank Certicom for providing us with SSL libraries for the PalmPilot. 
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Abstract. In this paper, we propose a generalized Takagi-Cryptosystem 
with a modulus of the form p'" q“ ■ We’ve studied for the optimal choice 
for r, s that gives the best efficiency while maintaining a prescribed se- 
curity level, and we show that the choice of either or 

pr- 2 qr +2 (jependiug on the value r -f s is the optimal. We also present 
comparison tables for the efficiency of RSA, the multiprime technology, 
Takagi’s scheme, and our proposed scheme. 



A Introduction 

The RSA system is one of the most practical public key cryptosystems. For se- 
curity concerns, the common modulus size of the RSA system is at least 1024 
bits currently and the size of the modulus must be increased due to the devel- 
opment of the factoring technology | 7 ]. As the size of the modulus is increasing, 
the required time and storage to implement the RSA system will be a big hurdle 
to use the RSA system on many occasions. 

In order to improve the efficiency of the implementation, many schemes have 
been proposed. One of the approaches is to give a variation to the form of 
modulus of the RSA system. The most general form of modulus n is 

n = , Ci > 1 for 1 < i < u 

where prime number p^’s are all distinct and about the same size. In the multi- 
prime technology |3], they use the modulus of the form: 

n = piP 2 ■■ - Pu, for rt > 3. 

In the multiprime technology, the encryption process is the same as RSA and the 
decryption is performed by using CRT (Chinese Remainder Theorem) in a paral- 
lel computation mode with u exponentiators. The multiprime technology relieves 

* Yie’s work was partly supported by Inha Research Fund 2000. 
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the computational complexity of the original RSA system, and has recently been 
adopted to a WTLS(Wireless Transport Layer Security) protocol. 

In [T], T. Takagi uses the modulus of the form 

n = P1P2 for r > 2 . 

The encryption process of Takagi ’s system is the same as RSA. In the decryption 
of Takagi’s system, he uses his previous method of p-adic expansion |2] to achieve 
the speedy decryption. Another improvement obtained by Takagi’s system is that 
the size of the private keys has been reduced. 

In this paper, we propose a generalization of Takagi’s algorithm for general 
modulus n = p^p^ ■ ■ -Pu“, and discuss about the case with optimal efficiency. 

A.l Our Result 

Our paper is organized as follows. In Section 2, we describe the proposed en- 
cryption and decryption process. In Section 3, we analyse the complexity of the 
decryption and determine the case with optimal efficiency. Our result says that 
the optimal efficiency can be obtained when we use two distinct prime factors 
whose respective exponents are relatively prime and as close as possible. For 
example, n = p^q'"^^ if the sum of the two exponents is odd, n = p'"~^q^'^^ if the 
sum of exponents is 0 modulo 4, and n = if the sum of exponents is 

2 modulo 4. In Table I and 2 below, a brief comparison of RSA with CRT, the 
multiprime technology, Takagi’s scheme, and the proposed scheme is given. To 
compare the complexity, we have analysed the number of crucial bit operations 
needed to implement each of the schemes. We only considered the case when the 
encrypting exponent e is small and we ignored the complexity concerns involved 
with e. Also we focused on the case n = p'^ . 

In Section 4, we discuss about the security of our proposed system with 
the modulus n = p'~q‘^ against the known factorization algorithms. Up to this 
point, the best known factorization methods for large numbers are Elliptic 
Curve Method(ECM [6]), Number Field Sieve(NFS |3), and Lattice Factor- 
ization Method(LFM [^). NFS is good for the modulus with smaller number of 
prime factors, and ECM and LFM are good for the modulus with many prime 
factors [^. 



Table 1. Complexity comparison of RSA, multiprime, Takagi’s and ours (non-parallel 
computing) 
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Table 2. Complexity comparison of RSA, multiprime, Takagi’s and ours (parallel 
computing) 
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A helpful and reliable table for the security margin for coming years and the 
corresponding size of RSA modulus has been suggested in |Z]. First we determine 
the size of n with the security margin in MIPS- Years according to their table. 
And then we will decide the optimal number of prime factors of n = to 
defeat both of ECM and LFM methods. In the multiprime technology, they 
introduced a technique to determine the optimal number of prime factors with 
respect to both of the ECM and NFS methods. We apply their method to the 
case n = p''q“. Suppose that we have a = logn and p, q are about the same size, 
then the optimal number t = r -I- s to defeat both of ECM and NFS attacks can 
be obtained by solving the following approximated equation for t : 

log y « 1.923 -^a(loga)^ — 2 log a -I- 36. 

For example, we have Table 3 for the modulus of our proposed scheme that gives 
the same security level as the RSA modulus of the same size. 

Suppose that for a fixed number a and a positive integer fc, the proposed 
scheme has the same security level as RSA with a modulus of size a when we 
use the modulus n = p^ q^ of size ka with tk = r + s prime factors. In Section 4, 
we shall show the following approximated relation for a, k,tk- 

ka ^ ka a , a „ , , 

— log — « — log — - 2 log k. 

tk tk ti ti 

>From the above relation, we get the optimal number of factors of the modulus. 

For example, current security margin is 2.06 x 10^° MIPS-Years and the 
corresponding size of the RSA modulus is 1024 bits 0. For various modulus 
sizes, we give the optimal choices of modulus which gives the same security level 
as RSA-1024 and the speed-ratio compared with 1024 bit RSA system in Table 
4. This implies that it would be the best to choose the modulus n of 2048 bits 
of the form of p^q^ for our proposed system when RSA-1024 is recommended. 

Table 3. Decryption speed comparison of RSA and ours (nonparallel computing) 



modulus size 


our modulus 


performance rate with respect to RSA with CRT 


1024 bits 


II 


3 times faster than RSA-1024 


4096 bits 


n = 


8 times faster than RSA-4096 


8192 bits 


n = p'^q'^ 


15 times faster than RSA-8192 
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Table 4. Speed comparison of RSA-1024 and ours with various modulus sizes 



modulus size 


1024 bits 


2048 bits 


4096 bits 


form of the modulus 


11 = pq‘‘ 


n = p^q‘^ 


30 

II 

e 


encryptiou speed compare to RSA-1024 


same 


4 times slower 


8 times slower 


decryptiou speed compare to RSA-1024 


3 times faster 


10 times faster 


6 times faster 



B Description of the Proposed Cryptosystem 

In this section, we describe the proposed scheme. In order to determine the case 
of optimal efficiency under the assumption that the sum of exponents is fixed, 
one should start with the general modulus of the form n = ■ • ’Pt^- But 

for the simplicity, we shall describe only the case n = p'^q^ . In Section 3, we shall 
explain why the best efficiency comes in the two prime cases. 

B.l Key Generation 

First we generate keys in the proposed system. For a given relatively prime 
positive integers r, s, we follow the following directions. When we generate large 
primes p, q, we apply the same rules as in the RSA system. 

1. Randomly choose large primes p, q. 

2. Compute n = p^q^. 

3. Compute L = lcm(p — l,q — 1). 

4. Randomly choose an odd integer e so that 1 < e < L and gcd(e, L) = 
gcd(e,n) = 1. 

5. Compute d = e~^ (mod L). i.e., ed=l (mod L). 

6. Publish e,n as the public keys, and keep d,p,q as the secret keys. 

It is well-known that if we choose e with gcd(e, <(>(n)) = 1 then the mapping 

E : by E{m) = nf (mod n) for m G Z^ 

becomes a one to one permutation on Z*. For n = p^q^, the above choice of e 
in our proposed system gives a one-to-one permutation on Z*. As in Takagi’s 
system [T], the above choice of parameters p, q, e, d allows us to use shorter keys 
than in the RSA system with the same modulus size. 

B.2 Encryption 

Now suppose we have set up the keys, n = p''q^ and e,d,L. In the proposed 
system, the message space and the ciphertext space are Z*. For a given message 
m, the ciphertext C is obtained by 

C = rrf (mod n), 

which is the same as RSA, multiprime technology, and Takagi’s system. 
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B.3 Decryption 

In the decryption, we use the same scheme as Takagi’s system [2], but we apply 
the p-adic expansion to the factor and apply g-adic expansion to the factor 
. Since p, q are distinct primes, by Chinese Remainder Theorem, we have 

— Zpr X Zqii . 

When we receive a ciphertext C in Z*, C can be splitted into 
C={A,B), Ae Zpr-, B G Zqs. 

Since C is a ciphertext, C = (mod n) for some m G Z^. Similarly, m can be 
splitted into two parts, X G Zpr and Y G Zqs . It is easy to check that A = 
(mod p’’) and B = (mod g®). Since X G Zpr, X can be represented as 

X = Xq + pXi + p'^X 2 \-p^~^Xr-i (mod p'’) 

for some Xi G Zp with 0 < i < r — 1. Similarly we have 

Y = Yq + qYi + q^Y 2 + ■ ■ • + g* ^Yg-i (mod g^) 

for some Yi G Zg with 0 < i < s — 1. Because of the similarity, it is enough to 
give a procedure to find X from a given A = X® (mod p^) and e, d such that 
ed = 1 (mod p— 1). This procedure is given in Takagi’s paper PQ. But we present 
the detail to analyse the complexity of this procedure. 

Now suppose A G Zpr is written by 



A = Aq+ pAi+ p^A 2 + hp'’ ^Ar-i (modp’’). 



For 1 < f < r — 1, we set 

A\i\ = Aq + pAi + ■ ■ • + p^Ai = (Xq + pXi + • • ■ + p*Xj)® (mod p*^^) 
Fi = (Xq + pXi + • • • + p* ^Xi_i)®. 

Then we note that Fr (mod p^) = A and A\r — 1] = A. We also note that 

A[i] = AQ+pAi + hpMi (modp®+^) 

= (Xo + pXi + • • • + p^X,Y (mod p*+i) 

= (Xo + pXi + • • • + + eXo"- VW (mod f+Y 

= F, + eXo^-yX, (modp*+i). 

It is easy to see that the following values 

„ _ e-‘X‘-(A|il-Fi (modp‘^‘)) 



(mod p) for 1 < t < r — 1 
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give X. Note that since e was chosen so gcd{e,p) = 1 and Xq in Z*, the terms 
e~^ and Xq~^ have a unique meaning modulo p. We have reduced the equations 
into modulo p which is simplified compare to Takagi’s algorithm. The overall 
speed is not affected much, but it is simpler. 

Now we estimate the complexity to get X. The complexity to compute Xq is 

log {d (mod p - l))(logp)^ « (logp)^ 

For 1 < i < r — 1, the most time consuming step to get Xi is 

Fi (mod = (Xo+pXi H (mod 

Since we use small exponent e, the complexity to obtain Xi is dominated by 
(logp*+^)^. Thus the complexity to get X is 



r 

(logp)^ + ^(logp*)^. 

i=2 

By the similar method, we can compute Y G Z*s so that B = (mod q^), and 
the complexity to get Y is 



S 



(logg)^ + ^(loggJ)^. 

Now by using CRT, we get the unique message m G Zpr-qs that satisfies 
m = X (modp’’), and m = Y (mod g'*), 
so we have recovered the message. 



C The Efficiency of the Proposed Cryptosystem 



In this section, we shall discuss about the complexity of the proposed scheme 
and compare the efficiency of RSA with CRT, multiprime technology, Takagi’s 
system and our proposed scheme. Takagi’s system is a special case s = 1 of our 
n = p'~q‘^. 

There is no difference in the encryption for all these methods, hence we only 
consider the complexity involved in the decryption process. 
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C.l The Complexity of the Decryption 



The complexity of the decryption for the proposed system with n = p^q‘^ is 
dominated by 

r s 

{logpf + ^(logpO^ + (^ogqf + ^(loggO^ 



= (logp)^ + ^i^(logp)^ + (logg)^ + '^i‘^{logqf 

f=2 i=2 



= 2(logp)3 + 



r s 






(logp)2. 



The last equality holds because p, q are of the same size. In this estimate, the 
dominant term is 2(logp)^. When we use u distinct primes of the same size 
(i.e., modulus is of the form n = p^^ ■ ■ -p^“) in the proposed scheme, the main 
complexity is w(logpi)^. Since the sum of the exponents is fixed, we conclude that 
the best efficiency can be obtained in the cases using two primes, i.e., n = 

We also note that the term + X)i =2 minimum value when r 

and s are about the same if r + s is fixed. We also note that r and s need to be 
relatively prime for security concerns. Hence the modulus of the form 

— (case 1) n = if the sum r + s of exponents is odd 

— (case 2) n = if the sum r + s of exponents is 0 modulo 4 

— (case 3) n = if the sum r + s of exponents is 2 modulo 4 



gives the best efficiency for the modulus of the form n = p''q’^ . 

Our proposed scheme is exactly the same as Takagi’s system when the sum of 
exponents is 3, 4, or 6. We note that our system is faster than the Takagi’s system 
in all the steps of the decryption procedure. The difference of complexities of 
Takagi’s system and our proposed system is at least |^^^q(^(log n)^ in case 1, 

(log n)^ in case 2, and (’~~3)(2r^ +3»-+i) ^ Table 5 and Table 6 

illustrate the differences of the crucial complexities of Takagi’s and ours. 

And the decryption complexity for RSA with CRT is Hence we can 

say that our proposed scheme is ^ times faster than the original RSA system 
with CRT where t = 2r + 1 in case 1, and t = 2r in cases 2 and 3. 

When we compute in a parallel environment, the proposed scheme is ^ faster 
than the original RSA with CRT. In the multiprime technology with the modulus 

Table 5. Decryption complexity of ours 





1 ours 1 


cases 


modular form 


complexity 


1 


n = 


7^(logn)3+-Ah|r^(logn)3 


2 


n = 


^(logn)3 + 2l2±2ll^(logn)3 


3 


r — 2 r+2 

n = p q ^ 


^{lognf + 2r-+3r;+25.+6(iog„)2 
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Table 6 . Decryption complexity of Takagi’s 





Takagi’s 


cases 


modular form 


complexity 


1 


27* 

n = p q 


^(logn)3+B-;+<^;;+-^(logn)^ 


2 


2r — 1 

n = p q 


^(logn)B + Bi^-fir^(logn)^ 


3 


II 

to 

1 


^(logn)B + «-^-f;:+-B(logn)^ 



n = P1P2 ■ ■ - Pt, the decryption speed is about ^ times faster than RSA system 
with CRT and this is one of the fastest techniques among RSA-like cryptosystems 
obtained by varying the number of prime factors of the modulus. If we assume 
that the computing can be performed in parallel, then our proposed scheme 
gives about the same decryption speed as the multiprime technology. But for 
our proposed scheme, we only need two exponentiators for parallel computation 
while one needs t exponentiators in the multiprime technology using the modulus 
of a product of t distinct primes. Thus we have 

Theorem 1. For an RSA-like cryptosystem (i.e., the encryption function is 
defined by e-th exponentiation in Z^) with modulus of the form n = p\^p^ ■ ■ ■ p'iF , 
the fastest decryption speed can be obtained in the case u = 2 and ei, 62 are about 
the same as long as we do not consider a parallel computation. In the parallel 
computing mode, this gives about the same speed as the multiprime technology, 
but only using two exponentiators. 

D The Security of the Proposed System 

The security of our scheme is very similar to Takagi’s system |T]. In this paper 
we only consider the attacks by factoring the modulus. First of all, we assume 
that r, s are relatively prime. It is because the actual complexity of factorization 
of p'^q^ is the complexity of the factorization of pBcd{r,s) qgad(r,s) ^ section 

we shall consider the case n = only. The other cases are similar. 

D.l Factorization of n = 

First we show that knowing pq from n = p^q^^^ gives the prime factors p and q 
in polynomial time. Note that if we have pq and n = p'"q^'^^, then = q can 
be obtained directly. And then r-th root of a positive integer can be obtained in 
polynomial time, so it gives p in polynomial time. Thus we have 

Theorem 2. Knowing pq from n = p'" q^~^^ gives the prime factors p and q in 
polynomial time. 

But there is no known methods of finding pq from n = p^q^~^^ without knowing 
p, q. Currently, the best known factoring algorithms for large numbers are the 
Number Field Sieve(NFS), Elliptic Curve Method(ECM). 
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D.2 Proper Choices for r in n = to Defeat LFM 

In 12] Boneh et.al. developed an efficient factoring algorithm LFM to factor 
numbers of the form n = p^q. They compared the complexity of LFM, ECM 
and NFS. Since the complexity of NFS depends only on the size of the number 
to be factored, they focused on comparing ECM and LFM to give a guide line 
in choosing proper r in n = p^q. At the end of the paper jS], they proposed an 
interesting question the applicability of LFM to the case n = where r, s 
are about the same size. We show that we can apply LFM to n = by 

modifying their method. And we can give a similar bound for r to guarantee 
n = p~^ is secure against LFM. 

Theorem 3. Let N = p'"q'"^^ where p, q are of the same size. The factor pq can 
be recovered from N, r by an algorithm with a running time of : 

exp Qr + 1 

where 7 is the time it takes to run LLL on a lattice of dimension 0{r^) with 
entries of size (r log N) . The algorithm is deterministic, and runs in polynomial 
space. 

We are going to sketch the proof. In Theorem 3.1 of [2|, for n = p^q, p can 
be recovered with a running time 

exp ( ^^logp ) 0 ( 7 ). 

V?' + c J 

If we apply this result to the modulus n = p^q'"^^, the running time to find p 
by the LFM algorithm is an exponential time exp(^^ logp) 0 ( 7 ) algorithm. In 
fact, in the proof of the theorem 3.1 of [Sj, they didn’t use the primality of p, q. 
We rewrite n by n = {pqYq, and Now we apply LFM to find pq and the running 
time for finding pq from n = p^q'"^^ using LFM is 

exp 

Hence when r > log (pq), then we can factor pq from n in a polynomial time, so 
it gives p, q in polynomial time by Theorem 4.1. 

Hence for the modulus n = p'^q^'^^ , we conclude ; 

— When r satisfies r > log(pq), i.e., r > 21ogp then n = can be factored 

in polynomial time. 

— When r satisfies r < ^2 logp, then ECM is more effective than LFM. 

D.3 Optimal Number of Prime Factors of n to Defeat Both of 
ECM and NFS 

The complexity of NFS depends on the size of the modulus, and the complexity 
of ECM depends on the size of the smallest prime factor. Since we assume that 
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all the prime factors have the same sizes, the complexity of ECM depends on the 
number of the prime factors of the modulus. Hence for the modulus n = 

NFS is more effective for factoring n when the number t = r + s of prime factors 
of n is small, and ECM becomes more effective as t gets larger. 

Our object is to choose the optimal number t = r + s to defeat ECM for 
n = p^q‘ . In the multiprime technology, they observed that the optimal number 
of prime factors with respect to NFS and ECM can be obtained by equating 
MIPS-Years of NFS and ECM to factor n. 

It is well-known that we have the following asymptotic estimates for the 
running time of NFS, and ECM to factor n when p is one of prime factors of n 

MIPS-Years for NFS « 1.5 x 10“®exp ^1.923 •^logn(log log , 
MIPS-Years for ECM « 3 x 10' exp ^2 logploglogp, 

where D is the number of decimal digits of n. 

We first determine the security margin (in MIPS-Years) for current comput- 
ing technology, and then determine the size of the modulus within the security 
margin against NFS to factor the modulus. In fact, a table for the proper secu- 
rity margin for coming years and the corresponding RSA modulus size is given 
in [^. Hence if we choose proper modulus size according either to the table given 
by A.K. Lenstra and E.R. Verheul or other appropriate techniques, then the size 
of the prime factor of the modulus n that guarantees the security against ECM 
can be determined by solving the following approximated equation : 

3 X 10~^®D^exp a/2 log p log log p « 1.5 x 10“^exp ^1.923 -/log n(log log . 

Now we simplify the above approximated equation and we get an approxi- 
mated equation that solves the optimal number of prime factors of n with respect 
to the known factoring algorithms. 

Theorem 4. Let n = p^ q’^ is of a bits and p, q are of the same size then the 
solution t = r + s of 



y 2y log y ~ 1.923 ^a(log — 2 log a -I- 36 

gives the optimal number of prime factors of n with respect to NFS, ECM and 
LFM as long as r < \J2 logp. 

When we get a solution t of the above approximated equation, we can decide 
the form of n as one of p^ ,p^~^cf^^ or depending on t is odd, 0 

modulo 4, or 2 modulo 4, respectively. 

For example, we can use n = pq^ for the modulus of 1024 bits ; n = p^q^ for 
modulus 4096 bits ; and n = p^q^ for modulus 8192 bits maintaining the same 
security level as RSA with the same modulus size. 
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D.4 Choices for t — r + s for n — That Gives the Same 

Security as the Given RSA Modulus TV 

Another way to use our scheme is to give variational size of n = according 
to the recommended RSA modulus N. Usually the secure choice for the RSA 
modulus is determined by the amount of the work (in MIPS- Years) to factor the 
modulus. 

Suppose Wo MIPS- Years is needed to factor the modulus TV using NFS. 
And suppose Wq MIPS- Years is needed to factor n = using ECM, where 

ri -I- Si = ti and n has the same size as TV. Let TV be of a bits. Now we shall 
select a proper number tk = Vk + Sk of prime factors of Uk = p'^'‘q^'’, where Uk is 
a modulus of the size ka bits. In order to choose the optimal tk = we need 
to solve 

Wo ~ 3 X exp ^2 log p log logp, 

where Df. = ^ q ^ . But we already have that 

® X (sflo) 

Thus we get an approximated equation for tk, the number of prime factors of 
Uk, that gives an equivalent security as the given modulus size a of the RSA 
system. 



Theorem 5. Suppose that ti = ri-|-si is the optimal number of prime factors of 
n of a bits that gives the same security level of the RSA system with modulus size 
a bits. Then when we expand the size of the modulus by k times, i.e., for modulus 
Uk of (ka) bits, the optimal number tk = Vk + Sk for Uk = to defeat both 

of ECM and NFS can be obtained by solving the following approximated equation 



ka , ka 
— log — 
tk tk 



a , a „ , , 

— log — - 2 log k. 
tl Cl 



As before, when we have a solution tk of the above equation, then we decide 
the form of the modulus Uk of the size ka depending on the value of tk- 

We see that tk gets larger as k increases. Hence the rate of the efficiency 
improvement of the scheme using Uk = compare to the RSA-(/cq;) is better 

as k increases. But when we expand the modulus size of RSA by a factor of k, the 
encryption speed of RSA-Zca is times slower than RSA-a, and the decryption 
speed of RSA-fca is k^ times slower than RSA-a, theoretically. Also, as k gets 
larger, we need more storage for keys, messages and computation environment. 
Hence we need to choose appropriate k by considering storage available and 
required encryption speed first and then choose proper r by solving the above 
equation. 
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E Conclusion 

In this paper we have considered a generalization of Takagi’s cryptosystem [J 
with a generalized modulus n = p{^ • • ■ p®" . And we propose a public key cryp- 
tosystem with modulus of the form n = p'"q'^ which is one of the fastest among 
RSA-like system with general modulus n = p®^ • • -p®“. We’ve shown that the 
proposed system is faster than Takagi’s system which is faster than the RSA 
system. We also investigated the optimal choices for r, s that gives a prescribed 
security level. Our proposed system has about the same efficiency as the multi- 
prime technology when we are allowed to use parrallel computing, but needs less 
number of exponentiators than the multiprime technology. Multiprime scheme 
has recently been adopted in a WTLS application (RSA’s BSAFE-WTLS-C), 
hence our system can also be adopted to similar applications. 
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